USP

It uniquely supports both autonomous and traditional agent paradigms, provides prebuilt agents for quick deployment, and includes a unified OCR interface, making it a versatile toolkit for diverse AI automation and data extraction needs. I…

Use cases

01Building autonomous AI agents for complex task automation
02Creating traditional AI agents with custom tools for specific workflows
03Automating data analysis and anomaly detection from logs
04Processing documents and extracting text using OCR
05Managing and evaluating machine learning experiments

Detected files (8)

prebuilt_autonomous_agents/applied_scientist/skills/evaluate/SKILL.mdskill

Show content (4698 bytes)

# Evaluate Skill

## Purpose
Compare baseline and new implementation results. Produce the machine-readable final report `result.json`, update `experiments.json`, and append a row to `comparison.json`.

## When to Use
Phase 5 — after the new implementation is complete and metrics are collected.

## Input
| Parameter | Type | Description |
|-----------|------|-------------|
| experiment_path | path | `experiments/{research_name}/` |
| research_name | string | Name of this experiment |

## Actions

1. **Collect all metrics** from `log.json` (Phase 3 baseline entry + Phase 4 new method entry).

2. **Determine verdict:**
   - `BETTER`: new method outperforms baseline on the majority of key metrics
   - `WORSE`: new method underperforms baseline on the majority of key metrics
   - `INCONCLUSIVE`: mixed results or differences within noise margin
   - `FAILED`: experiment could not produce comparable results (dependency failure, implementation crash, data incompatibility)

3. **Write `{experiment_path}/result.json`** in the exact schema below. Always valid JSON; never leave fields undefined — use `null` for unknown values.

   ```json
   {
     "name": "{research_name}",
     "verdict": "BETTER",
     "summary": "2-3 paragraphs explaining what the new method does, how it fundamentally differs from the baseline, and what trade-offs it makes.",
     "explanation": "2-3 sentences explaining WHY this verdict was reached. Reference specific metrics and their differences. Be concrete — mention numbers, not vague statements.",
     "comparison": {
       "metrics": [
         {
           "name": "accuracy",
           "current": 0.853,
           "new":     0.872,
           "diff":    0.019,
           "diff_display": "+0.019",
           "unit": null,
           "higher_is_better": true,
           "better": "new"
         },
         {
           "name": "training_time_seconds",
           "current": 2.0,
           "new":     45.0,
           "diff":    43.0,
           "diff_display": "+43.0",
           "unit": "seconds",
           "higher_is_better": false,
           "better": "current"
         }
       ]
     },
     "file_locations": {
       "current_notebook":   "experiments/{research_name}/current.ipynb",
       "current_data":       "experiments/{research_name}/current_data/",
       "new_notebook":       "experiments/{research_name}/new.ipynb",
       "research_source":    "experiments/{research_name}/research.pdf",
       "experiment_log":     "experiments/{research_name}/log.json"
     }
   }
   ```

   ### Field rules
   - `verdict`: exactly one of `"BETTER"`, `"WORSE"`, `"INCONCLUSIVE"`, `"FAILED"`.
   - `summary` / `explanation`: plain text, no markdown headings. Short paragraphs only.
   - `comparison.metrics[]`:
     - `current` / `new` are numbers (or `null` if a side could not compute the metric).
     - `diff = new - current` (raw number). `diff_display` is the short string with sign (`"+0.019"`, `"-0.03"`).
     - `better`: `"new"` | `"current"` | `"tie"` | `null` — computed from `diff` and `higher_is_better`.
     - `unit` is a short unit string (`"seconds"`, `"%"`, etc.) or `null`.
   - `file_locations` uses paths relative to the experiments directory root. `research_source` must match whatever Phase 0 materialized — `research.pdf`, `research_source.{ext}`, or the `research_source/` directory for a cloned repo.

4. **Update `experiments/experiments.json`:**
   - Set `status` to `"completed"` (or `"failed"` if the experiment failed).
   - Fill in `verdict`, `key_metric`, `baseline_model`, `new_method`.
   - `key_metric` is an object: `{"name": "...", "baseline": <num>, "new": <num>}`.

5. **Update `experiments/comparison.json`:**
   - If the file does not exist, create it with `{"experiments": []}`.
   - Append an entry:
     ```json
     {
       "name": "{research_name}",
       "date": "YYYY-MM-DD",
       "baseline": "{baseline_model}",
       "new_method": "{new_method}",
       "key_metric": {"name": "accuracy", "baseline": 0.853, "new": 0.872},
       "verdict": "BETTER"
     }
     ```

6. **Update `{experiment_path}/log.json`** — append a Phase 5 entry:
   ```json
   {
     "name": "Phase 5: Evaluate",
     "completed_at": "2026-04-17T11:40:00Z",
     "verdict": "BETTER",
     "key_change": "accuracy +0.019 (new > current)",
     "files_written": ["result.json", "experiments.json", "comparison.json"]
   }
   ```

## Output
- `{experiment_path}/result.json` — the final machine-readable report.
- `experiments/experiments.json` — updated with this experiment's final verdict.
- `experiments/comparison.json` — new row appended.
- `log.json` — finalized with Phase 5 entry.

prebuilt_autonomous_agents/applied_scientist/skills/experiment_management/SKILL.mdskill

Show content (7331 bytes)

# Experiment Management Skill

## Purpose
Set up and manage the experiment folder structure. This is Phase 0 — it runs before any analysis begins. All bookkeeping files are JSON (never markdown).

## When to Use
- At the very start of a new experiment
- When updating `experiments.json` or `comparison.json` after an experiment completes

## Input
| Parameter | Type | Description |
|-----------|------|-------------|
| research_name | string | The experiment name **as given by the caller**. Use it verbatim — do not rename it, do not re-derive it from the source title. |
| research_source | ref | A free-form reference describing the new method. The caller can pass anything that identifies the content — a local file path, any URL (blog post, arXiv, docs, Hugging Face page, …), a git repository, a Kaggle link, a paper ID, or **a plain text idea** describing the approach to try. Do not reject unusual values; investigate and fetch whatever was given, or, for pure text ideas, save the text verbatim. |
| current_notebook | path | Path to the current baseline .ipynb |
| current_data | ref \| placeholder | Path to the current dataset (file or directory), a short description of how the notebook loads data, **or** the literal placeholder `"(not provided — infer it from the current notebook's data-loading cells)"`. When you see that placeholder, read the current notebook yourself and infer the source from its data-loading cells; do not ask the user. |
| experiments_directory | path | The directory (inside the workspace) where experiment folders live (e.g. `./experiments`). |

## Actions

### Setup (start of experiment)

1. **Create experiment directory:**
   ```
   experiments/{research_name}/
   ```

2. **Copy baseline files (NEVER move, NEVER modify originals):**
   ```bash
   cp {current_notebook} experiments/{research_name}/current.ipynb
   # Only when current_data is a real path on disk:
   cp -r {current_data}  experiments/{research_name}/current_data/
   ```

   Resolve `{current_data}` as follows before copying:

   - **Real local path** (file or directory that exists on disk) → `cp` / `cp -r` it into `current_data/`.
   - **Short description of a code-based download** (e.g. `"downloaded in notebook (ucimlrepo, id=2)"`) → leave `current_data/` empty and record the description verbatim in `log.json.metadata.original_data`.
   - **Placeholder `"(not provided — infer it from the current notebook's data-loading cells)"`** → open `current.ipynb` yourself, scan for data-loading cells (`pd.read_csv`, `fetch_openml`, `fetch_ucirepo`, `load_dataset`, `kaggle.api...`, `urllib`/`requests` downloads, `np.load`, local paths, …), write down the exact loader you found as `log.json.metadata.original_data`, and make sure Phase 4's `new.ipynb` uses the same loader. Do not ask the user for clarification — do the investigation yourself.

3. **Materialize the research source.** `{research_source}` can be anything — a local file, a URL of any kind, a git or Kaggle link, an arXiv / paper ID, a Hugging Face page, **or a plain text idea** describing the method to try. Your job is to bring its content into the experiment folder using whatever tool fits:

   - **Investigate first.** Check if the value is a path on disk (`ls` / `test -e`); if it looks like a URL, poke it with `curl -I`; look at the hostname; read any hint in the value itself. If the value does not look like a path or URL at all, treat it as a **text idea** (see below). Do not rely on a fixed detection list.
   - **Retrievable source** → fetch it with the most appropriate tool: `cp`, `git clone --depth 1`, `curl -L` / `wget`, `kaggle kernels pull …` / `kaggle datasets download …`, `huggingface-cli`, an arXiv PDF helper, Python downloaders, or anything else available. Install a missing CLI with `pip install` / `uv pip install` if it is the right tool for this source.
   - **Text idea** → do not fabricate a paper or URL. Save the description verbatim to `experiments/{research_name}/research_source.md` (optionally with a leading `# Idea` heading) and let Phase 2 turn it into a concrete implementation plan.
   - **Pick a sensible local name** based on what you actually produced:
     - A single PDF → `experiments/{research_name}/research.pdf`
     - Any other single file (including a text idea) → `experiments/{research_name}/research_source.{ext}` (`.md` for ideas; preserve the real extension for files you copied)
     - Multiple files or a cloned repo / dataset → `experiments/{research_name}/research_source/` (a directory)
     - A fetched HTML page → `research_source.html`, optionally with a cleaned `research_source.md` and/or a linked `research.pdf`
   - **Choose your own `research_source_kind` label** to describe what you did (e.g. `pdf`, `file`, `git`, `kaggle_notebook`, `kaggle_dataset`, `arxiv`, `huggingface_model`, `html`, `idea`, `other`). This label is just for observability — there is no closed enum.
   - **If fetching fails**, try an alternative (different CLI, raw HTML fallback, `curl` instead of a specialized tool). Only after genuine failure, log the attempts in `log.json`, mark the experiment `FAILED`, and explain what you tried in `result.json.explanation`. A text idea can never "fail to fetch" — it is always saved verbatim.

   Let `research_source_local` be whichever local path you produced. Use that path for Phase 2 onwards — never re-fetch.

4. **Create `log.json`** with the starting skeleton:
   ```json
   {
     "name": "{research_name}",
     "metadata": {
       "date": "YYYY-MM-DD",
       "original_notebook":      "{current_notebook}",
       "original_data":          "{current_data}",
       "research_source":        "{research_source_local}",
       "research_source_origin": "{research_source}",
       "research_source_kind":   "a short label you pick, e.g. pdf, file, git, kaggle_notebook, kaggle_dataset, arxiv, huggingface_model, html, idea, other"
     },
     "phases": []
   }
   ```
   Phases append entries here as they finish; never overwrite earlier entries.

4. **Register in `experiments/experiments.json`:**
   - If the file does not exist, create it with `{"experiments": []}`.
   - Append a new entry with `status: "in_progress"`:
     ```json
     {
       "name": "{research_name}",
       "date": "YYYY-MM-DD",
       "status": "in_progress",
       "paper": "{paper_title}",
       "baseline_model": null,
       "new_method": null,
       "verdict": null,
       "key_metric": null,
       "path": "experiments/{research_name}/"
     }
     ```

5. **Create initial `progress.json`** (see `skills/progress/SKILL.md` for schema) with:
   - `status: "RUNNING"`
   - all phases listed, all `pending`, Phase 0 marked `current`
   - `started_at` and `updated_at` set to now (UTC ISO-8601)

### Finalize (end of experiment)

1. Update the experiment entry in `experiments.json` with final `status`, `verdict`, and `key_metric` (see `skills/evaluate/SKILL.md`).
2. Append a row to `experiments/comparison.json`.
   - If the file does not exist, create it with `{"experiments": []}`.

## Output
- `experiments/{research_name}/` directory with copied files
- `experiments/{research_name}/log.json` initialized
- `experiments/{research_name}/progress.json` initialized
- `experiments/experiments.json` updated

prebuilt_autonomous_agents/applied_scientist/skills/implement/SKILL.mdskill

Show content (3537 bytes)

# Implement Skill

## Purpose
Create a new Jupyter notebook implementing the method from the research paper, using the same data as the baseline. Record implementation details and measured metrics as a structured JSON entry.

## When to Use
Phase 4 — after benchmark metrics are defined and baseline values are extracted.

## Input
| Parameter | Type | Description |
|-----------|------|-------------|
| experiment_path | path | `experiments/{research_name}/` |

## Actions

1. **Install dependencies:**
   - Install any new packages identified in Phase 2.
   - Capture installed package names and versions for the log entry below.

2. **Write `{experiment_path}/new_requirements.txt`:**
   - List all packages the new notebook needs (one per line, `package==version`).
   - Include both existing dependencies and new ones from the paper.

3. **Create `{experiment_path}/new.ipynb`** with this structure:

   ```
   [Markdown] # {Research Name} - New Method Implementation
   [Markdown] ## 1. Setup & Imports
   [Code]     import statements + dependency checks

   [Markdown] ## 2. Data Loading
   [Code]     load from experiments/{research_name}/current_data/
              (use the SAME data loading logic as current.ipynb)

   [Markdown] ## 3. Data Preprocessing
   [Code]     preprocessing as required by the new method
              (note any differences from baseline preprocessing)

   [Markdown] ## 4. Model Implementation
   [Code]     implement the new method from the paper

   [Markdown] ## 5. Training
   [Code]     train the model
              (use same train/test split as baseline for fair comparison)

   [Markdown] ## 6. Evaluation
   [Code]     compute ALL comparison metrics defined in Phase 3

   [Markdown] ## 7. Results Summary
   [Code]     print all metrics in a structured format
   ```

4. **Implementation rules:**
   - Use the SAME train/test split (same random seed, same ratio) as the baseline.
   - Use the SAME data — load from `current_data/`, do not download new data.
   - Compute ALL metrics defined in Phase 3 (including any with `"needs_computation": true`).
   - Add timing measurements for training (`training_time_seconds`).
   - Handle errors gracefully — if the method fails, log why.
   - **Efficiency:** if data is large (100K+ rows), sample it to a manageable size (10K–30K rows). Both notebooks must use the exact same sample. Use paper's recommended hyperparameters — do not run exhaustive grid searches. If training takes more than 10 minutes, reduce data size or simplify config. The goal is a fair comparison, not a production model.

5. **Run the notebook** end-to-end and verify it executes without errors.

6. **Append a Phase 4 entry to `{experiment_path}/log.json`** under `phases`:
   ```json
   {
     "name": "Phase 4: Implement",
     "completed_at": "2026-04-17T11:30:00Z",
     "new_dependencies_installed": [
       {"name": "catboost", "version": "1.2.5"}
     ],
     "training": {
       "split": 0.2,
       "seed": 42,
       "stratified": true
     },
     "metrics": {
       "accuracy": 0.8721,
       "f1":       0.7310,
       "roc_auc":  0.9288,
       "training_time_seconds": 45.2
     },
     "notebook_executed": true,
     "errors":   [],
     "warnings": []
   }
   ```

   Do not overwrite earlier entries; append to the `phases` array.

## Output
- `{experiment_path}/new.ipynb` — complete, executed notebook
- `{experiment_path}/new_requirements.txt` — written
- `{experiment_path}/log.json` — updated with Phase 4 implementation entry

prebuilt_autonomous_agents/applied_scientist/skills/progress/SKILL.mdskill

Show content (4397 bytes)

# Progress Skill

## Purpose
Maintain a **machine-readable** progress file so dashboards, CLIs, and notebooks can poll the experiment's state at any time. The file is a JSON document — never markdown, never human-prose-first.

## When to Use
**Constantly.** This skill is not a phase — it runs alongside every phase. You must overwrite `progress.json` at these moments:

1. **Phase start** — when you begin a new phase
2. **Phase end** — when you complete a phase
3. **Before long operations** — before training a model, installing dependencies, reading a large PDF
4. **On failure** — immediately when something goes wrong
5. **On completion** — when the full experiment finishes

## File Location
```
experiments/{research_name}/progress.json
```

## Format (CANONICAL — emit exactly)

The file is **overwritten** each time (not appended). It is always the full current snapshot. Use UTC ISO-8601 timestamps. Match this schema **byte-for-byte** — do not invent alternative field names, do not use a dict where a list is specified, do not translate status values to synonyms.

```json
{
  "name": "{research_name}",
  "status": "RUNNING",
  "started_at": "2026-04-17T10:00:00Z",
  "updated_at": "2026-04-17T10:25:00Z",
  "phases": [
    {"index": 0, "name": "Setup",           "status": "done",    "summary": "Copied notebook, data, paper."},
    {"index": 1, "name": "Analyze Current", "status": "done",    "summary": "Baseline is XGBoost, 85.3% accuracy."},
    {"index": 2, "name": "Research",        "status": "current", "summary": null},
    {"index": 3, "name": "Benchmark",       "status": "pending", "summary": null},
    {"index": 4, "name": "Implement",       "status": "pending", "summary": null},
    {"index": 5, "name": "Evaluate",        "status": "pending", "summary": null}
  ],
  "current_activity": "Reading research.pdf — extracting method summary and requirements.",
  "issues": []
}
```

### Field rules (strict)

- **`status`** is one of: `"RUNNING"`, `"COMPLETED"`, `"FAILED"`. Uppercase. Nothing else.
- **`phases`** is a **JSON array**, never an object. Exactly six elements, in order: Setup, Analyze Current, Research, Benchmark, Implement, Evaluate. Use those exact `name` values.
- **`phases[].status`** is one of: `"done"`, `"current"`, `"pending"`, `"failed"`. Lowercase. Do **not** use `"completed"`, `"in_progress"`, `"todo"`, or any other synonym.
- **`phases[].index`** is a 0-based integer matching the position in the array.
- Exactly one phase may have `status == "current"` while the top-level `status == "RUNNING"`. On `COMPLETED` / `FAILED`, no phase should be `"current"`.
- **`phases[].summary`** is one short sentence, or `null` if the phase has not run yet.
- **`current_activity`** is one or two sentences describing what is happening **right now**.
- **`issues`** is an array of short strings; use `[]` when clean, never `null`.
- Do **not** add extra top-level keys (e.g. `current_phase`), and do not use dict-of-phases shapes like `{"phase_0_setup": {...}}`.

## Rules

1. **Overwrite, don't append.** The file is a snapshot, not a log. `log.json` is the log.
2. **Valid JSON only.** Never write partial/invalid JSON. Write to a temp file and rename if needed.
3. **Update before, not after.** Update progress BEFORE starting a long operation. The user wants to know what's happening now, not what already happened.
4. **Be honest about failures.** On error, immediately set `status = "FAILED"`, mark the current phase `"failed"`, and append a message to `issues`.
5. **Always refresh `updated_at`** — a stale timestamp tells the user nothing is moving.

## Lifecycle

| Moment | Action |
|--------|--------|
| Phase 0 starts | Create `progress.json`, `status="RUNNING"`, all phases `pending`, Phase 0 → `current`, set `started_at` + `updated_at` |
| Phase N starts | Previous phase → `done` with one-line `summary`; Phase N → `current`; refresh `current_activity` + `updated_at` |
| Long operation starts | Update `current_activity` (e.g. `"Training model — this may take a few minutes"`) + `updated_at` |
| Phase N ends | Mark Phase N → `done` with one-line `summary` |
| Experiment completes | All phases `done`, `status="COMPLETED"`, `current_activity="Done. See result.json."` |
| Experiment fails | `status="FAILED"`, current phase → `"failed"`, `issues` populated, `current_activity` describes the error |

prebuilt_autonomous_agents/applied_scientist/skills/research/SKILL.mdskill

Show content (4232 bytes)

# Research Skill

## Purpose
Read the materialized research source and extract actionable information needed to implement the proposed method. Record the findings as a structured JSON entry.

## When to Use
Phase 2 — after the current implementation has been analyzed.

## Input
| Parameter | Type | Description |
|-----------|------|-------------|
| experiment_path | path | `experiments/{research_name}/` |

The research source was materialized into `{experiment_path}` during Phase 0. Its local path is recorded in `{experiment_path}/log.json` under `metadata.research_source`, and a short descriptive label Phase 0 chose is under `metadata.research_source_kind`. The label is free-form (common values: `pdf`, `file`, `git`, `kaggle_notebook`, `kaggle_dataset`, `arxiv`, `huggingface_model`, `html`, `idea`, `other`), but treat it as a hint only — always follow the actual path in `metadata.research_source`.

Inspect that path and read whatever is there:

- A single file (PDF, Markdown, HTML, `.ipynb`, text, …) → read it directly.
- A directory → read the obvious entry points first (`README*`, `*.ipynb`, top-level notebooks or code, `docs/`, dataset descriptions), then skim the rest as needed.
- A text **idea** (`research_source_kind == "idea"`, typically a short `research_source.md`) → read the user's description carefully and turn it into a concrete method plan. Pick a specific algorithm / library that matches the description, define the hyperparameters you will use, and document your interpretation explicitly in the Phase 2 log entry. If the idea is ambiguous, commit to a reasonable default and note the trade-off — do not invent a citation or claim the idea came from a paper.

Do not try to re-fetch the source. If the content is insufficient, note what is missing in the Phase 2 log entry and proceed with the best analysis you can.

## Actions

1. **Read the materialized research source** at `metadata.research_source` (falling back to `research.pdf` for legacy experiments) and extract:

   - **Method Summary:** 2-3 short paragraphs describing what the paper proposes, what problem it solves, and how it differs from traditional approaches.
   - **Pros:** each advantage the paper claims or demonstrates.
   - **Cons:** stated or inferred limitations, assumptions, or weaknesses.
   - **Implementation Requirements:**
     - Required libraries/packages (with versions if specified)
     - Required data format or preprocessing
     - Required compute resources (GPU, memory, etc.)
     - Key hyperparameters to set
   - **Compatibility Analysis:**
     - Can the method use the same data as the current baseline?
     - Does it need different preprocessing?
     - Does it output comparable predictions (same format)?
     - Can the same metrics be used for comparison?

2. **Append a Phase 2 entry to `{experiment_path}/log.json`** under `phases`:
   ```json
   {
     "name": "Phase 2: Research",
     "completed_at": "2026-04-17T10:30:00Z",
     "paper": {
       "title":   "CatBoost: Unbiased Boosting with Categorical Features",
       "authors": ["Prokhorenkova et al."],
       "method_summary": "CatBoost is a gradient-boosting framework that handles categorical features natively via ordered target statistics and uses oblivious decision trees to reduce overfitting."
     },
     "pros": [
       "Native categorical handling — no manual encoding needed",
       "Reduces target leakage with ordered boosting",
       "Strong out-of-the-box performance"
     ],
     "cons": [
       "Training slower than XGBoost for small data",
       "More memory intensive"
     ],
     "requirements": {
       "new_dependencies": ["catboost>=1.2"],
       "data_format": "pandas.DataFrame with categorical columns marked",
       "compute": "CPU is sufficient; GPU optional"
     },
     "compatibility": {
       "same_data":    true,
       "same_metrics": true,
       "preprocessing_notes": "CatBoost takes raw categorical columns; do NOT pre-encode them for the new notebook."
     }
   }
   ```

   Do not overwrite earlier entries; append to the `phases` array.

## Output
- `{experiment_path}/log.json` — updated with Phase 2 research entry
- No other files created or modified

prebuilt_autonomous_agents/applied_scientist/skills/analyze_current/SKILL.mdskill

Show content (2527 bytes)

# Analyze Current Skill

## Purpose
Read and understand the current baseline implementation. Extract all relevant information about the existing approach without modifying anything, and record the analysis as a structured JSON entry.

## When to Use
Phase 1 — after experiment setup is complete and files are copied to the experiment folder.

## Input
| Parameter | Type | Description |
|-----------|------|-------------|
| experiment_path | path | `experiments/{research_name}/` |

## Actions

1. **Read `{experiment_path}/current.ipynb`** and extract:
   - Model/algorithm used
   - Preprocessing steps (encoding, scaling, feature selection, etc.)
   - Training approach (train/test split ratio, cross-validation, etc.)
   - Hyperparameters
   - Metrics used and their values
   - Target variable and feature set

2. **Extract dependencies:**
   - Scan all import statements in the notebook.
   - Write `{experiment_path}/current_requirements.txt` with one package per line (`package==version` if determinable, otherwise just `package`).

3. **Read `{experiment_path}/current_data/`** (or, for code-based data, the download spec):
   - Identify data format (CSV, parquet, etc.)
   - Note number of rows, columns
   - Note data types and any special handling

4. **Append a Phase 1 entry to `{experiment_path}/log.json`** under `phases`:
   ```json
   {
     "name": "Phase 1: Analyze Current",
     "completed_at": "2026-04-17T10:15:00Z",
     "model": "XGBoost",
     "preprocessing": [
       "Drop rows with NaN",
       "LabelEncoder on target",
       "LabelEncoder on categorical features",
       "StandardScaler on numerical features"
     ],
     "training": {
       "split": 0.2,
       "seed": 42,
       "stratified": true
     },
     "hyperparameters": {
       "n_estimators": 200,
       "max_depth": 6,
       "learning_rate": 0.1
     },
     "metrics": {
       "accuracy": 0.8726,
       "f1":       0.7277,
       "roc_auc":  0.9274
     },
     "target": "income",
     "features_count": 14,
     "data": {
       "source": "ucimlrepo fetch_ucirepo(id=2)",
       "format": "pandas.DataFrame",
       "rows": 45222,
       "cols": 14
     },
     "notes": "Data downloaded programmatically; both notebooks must use the same source."
   }
   ```

   Do not overwrite earlier entries; append to the `phases` array.

## Output
- `{experiment_path}/log.json` — updated with complete Phase 1 analysis entry
- `{experiment_path}/current_requirements.txt` — written
- No other files created or modified

prebuilt_autonomous_agents/applied_scientist/skills/benchmark/SKILL.mdskill

Show content (2416 bytes)

# Benchmark Skill

## Purpose
Define the comparison metrics and extract baseline values from the current implementation. Record them as a structured JSON entry so downstream phases and final evaluation can read them directly.

## When to Use
Phase 3 — after both current analysis and research analysis are complete.

## Input
| Parameter | Type | Description |
|-----------|------|-------------|
| experiment_path | path | `experiments/{research_name}/` |

## Actions

1. **Define comparison metrics:**
   - Include ALL metrics already used in `current.ipynb`.
   - Add any additional metrics that are relevant for the new method.
   - For classification: accuracy, precision, recall, F1, AUC-ROC (as applicable).
   - For regression: MSE, RMSE, MAE, R² (as applicable).
   - Include training time if measurable.

2. **Extract baseline values:**
   - Read metric values from `current.ipynb` output cells.
   - If a metric is not computed in the notebook, record it as `null` and set `"needs_computation": true` — both notebooks must then compute it.

3. **Append a Phase 3 entry to `{experiment_path}/log.json`** under `phases`:
   ```json
   {
     "name": "Phase 3: Benchmark",
     "completed_at": "2026-04-17T10:45:00Z",
     "metrics": [
       {
         "name": "accuracy",
         "description": "Fraction of correctly classified samples.",
         "higher_is_better": true,
         "baseline": 0.8726,
         "needs_computation": false
       },
       {
         "name": "f1",
         "description": "F1 score (binary, positive class).",
         "higher_is_better": true,
         "baseline": 0.7277,
         "needs_computation": false
       },
       {
         "name": "roc_auc",
         "description": "Area under the ROC curve.",
         "higher_is_better": true,
         "baseline": 0.9274,
         "needs_computation": false
       },
       {
         "name": "training_time_seconds",
         "description": "Wall-clock training time.",
         "higher_is_better": false,
         "baseline": null,
         "needs_computation": true
       }
     ],
     "notes": "training_time_seconds must be added to both notebooks for a fair comparison."
   }
   ```

   Do not overwrite earlier entries; append to the `phases` array.

## Output
- `{experiment_path}/log.json` — updated with Phase 3 benchmark entry
- Clear list (in `metrics`) of what the new implementation must compute

src/upsonic/prebuilt/applied_scientist/template/skills/analyze_current/SKILL.mdskill

Show content (2527 bytes)

# Analyze Current Skill

## Purpose
Read and understand the current baseline implementation. Extract all relevant information about the existing approach without modifying anything, and record the analysis as a structured JSON entry.

## When to Use
Phase 1 — after experiment setup is complete and files are copied to the experiment folder.

## Input
| Parameter | Type | Description |
|-----------|------|-------------|
| experiment_path | path | `experiments/{research_name}/` |

## Actions

1. **Read `{experiment_path}/current.ipynb`** and extract:
   - Model/algorithm used
   - Preprocessing steps (encoding, scaling, feature selection, etc.)
   - Training approach (train/test split ratio, cross-validation, etc.)
   - Hyperparameters
   - Metrics used and their values
   - Target variable and feature set

2. **Extract dependencies:**
   - Scan all import statements in the notebook.
   - Write `{experiment_path}/current_requirements.txt` with one package per line (`package==version` if determinable, otherwise just `package`).

3. **Read `{experiment_path}/current_data/`** (or, for code-based data, the download spec):
   - Identify data format (CSV, parquet, etc.)
   - Note number of rows, columns
   - Note data types and any special handling

4. **Append a Phase 1 entry to `{experiment_path}/log.json`** under `phases`:
   ```json
   {
     "name": "Phase 1: Analyze Current",
     "completed_at": "2026-04-17T10:15:00Z",
     "model": "XGBoost",
     "preprocessing": [
       "Drop rows with NaN",
       "LabelEncoder on target",
       "LabelEncoder on categorical features",
       "StandardScaler on numerical features"
     ],
     "training": {
       "split": 0.2,
       "seed": 42,
       "stratified": true
     },
     "hyperparameters": {
       "n_estimators": 200,
       "max_depth": 6,
       "learning_rate": 0.1
     },
     "metrics": {
       "accuracy": 0.8726,
       "f1":       0.7277,
       "roc_auc":  0.9274
     },
     "target": "income",
     "features_count": 14,
     "data": {
       "source": "ucimlrepo fetch_ucirepo(id=2)",
       "format": "pandas.DataFrame",
       "rows": 45222,
       "cols": 14
     },
     "notes": "Data downloaded programmatically; both notebooks must use the same source."
   }
   ```

   Do not overwrite earlier entries; append to the `phases` array.

## Output
- `{experiment_path}/log.json` — updated with complete Phase 1 analysis entry
- `{experiment_path}/current_requirements.txt` — written
- No other files created or modified

README

Upsonic

Build Autonomous AI Agents in Python

Documentation • Quickstart • Examples • Discord

Overview

Upsonic is a Python framework for building autonomous agents like OpenClaw and Claude Cowork, as well as more traditional agent systems.

Quick Start

Installation

uv pip install upsonic
# pip install upsonic

IDE Integration

Add Upsonic docs as a source in your coding tools:

Cursor: Settings → Indexing & Docs → Add https://docs.upsonic.ai/llms-full.txt

Also works with VSCode, Windsurf, and similar tools.

Create Autonomous Agent

Build Your Own

from upsonic import AutonomousAgent, Task

agent = AutonomousAgent(
    model="anthropic/claude-sonnet-4-5",
    workspace="/path/to/logs"
)

task = Task("Analyze server logs and detect anomaly patterns")

agent.print_do(task)

All file and shell operations are restricted to workspace. Path traversal and dangerous commands are blocked.

Use Our Prebuilt Ones

Prebuilt autonomous agents are ready-to-run agents built by the Upsonic community, each packaging a skill, system prompt, and first message so you can go from install to running in seconds. The collection is open to contributions, bring your agent and open a PR.

Learn more: Prebuilt Autonomous Agents

Next steps: Connect a Sandbox Provider (E2B) for isolated cloud execution environments.

Create Traditional Agent

from upsonic import Agent, Task

agent = Agent(model="anthropic/claude-sonnet-4-5", name="Stock Analyst Agent")

task = Task(description="Analyze the current market trends")

agent.print_do(task)

Add Custom Tools

from upsonic import Agent, Task
from upsonic.tools import tool

@tool
def sum_tool(a: float, b: float) -> float:
    """
    Add two numbers together.

    Args:
        a: First number
        b: Second number

    Returns:
        The sum of a and b
    """
    return a + b

task = Task(
    description="Calculate 15 + 27",
    tools=[sum_tool]
)

agent = Agent(model="anthropic/claude-sonnet-4-5", name="Calculator Agent")

result = agent.print_do(task)

Next steps: Integrate MCP Tools to connect your agents to thousands of external data sources and services.

OCR and Document Processing

Upsonic provides a unified OCR interface with a layered pipeline: Layer 0 handles document preparation (PDF to image conversion, preprocessing), Layer 1 runs the OCR engine.

uv pip install "upsonic[ocr]"

from upsonic.ocr import OCR
from upsonic.ocr.layer_1.engines import EasyOCREngine

engine = EasyOCREngine(languages=["en"])
ocr = OCR(layer_1_ocr_engine=engine)

text = ocr.get_text("invoice.pdf")
print(text)

Supported engines: EasyOCR, RapidOCR, Tesseract, PaddleOCR, DeepSeek OCR, DeepSeek via Ollama.

Learn more: OCR Documentation

Check Our Videos

Documentation and Resources

Documentation - Complete guides and API reference
Quickstart Guide - Get started in 5 minutes
Examples - Real-world examples and use cases
API Reference - Detailed API documentation

Community and Support

💬 Join our Discord community! — Ask questions, share what you're building, get help from the team, and connect with other developers using Upsonic.

Discord - Chat with the community and get real-time support
Issue Tracker - Report bugs and request features
Changelog - See what's new in each release

License

Upsonic is released under the MIT License. See LICENCE for details.

Contributing

We welcome contributions from the community! Please read our Contributing Guide and code of conduct before submitting pull requests.