5.5 KiB

Raw Blame History

Cell Segmentator

Overview

This repository provides two main scripts to configure and run a cell segmentation workflow:

generate_config.py: Interactive script to create JSON configuration files for training or prediction.
main.py: Entry point to train, test, or predict using the generated configuration.

Installation

Install uv: Follow the official guide at https://docs.astral.sh/uv/

Linux / macOS

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

uv --version

Clone the repository:

git clone https://git.ai.infran.ru/ilyukhin/model-v
cd model-v

Install dependencies:
```
uv sync
```

Dataset Structure

Your data directory must follow this hierarchy:

path_to_data_folder/
├── images/        # Input images
│   ├── img1.tif
│   ├── img2.tif
│   └── ...
└── masks/         # Ground-truth instance masks
    ├── mask1.tif
    ├── mask2.tif
    └── ...

If your dataset contains multiple classes (e.g., class A and B) and you prefer not to duplicate images, you can organize masks into class-specific subdirectories:

path_to_data_folder/
├── images/
│   └── img1.tif
└── masks/
    ├── A/      # Masks for class A
    │   ├── img1_mask.tif
    │   └── ...
    └── B/      # Masks for class B
        ├── img1_mask.tif
        └── ...

In this case, set the masks_subdir field in your dataset configuration to the name of the mask subdirectory (e.g., "A" or "B").

Mask format: Instance masks should be provided for multi-label segmentation with channel-last ordering, i.e., each mask array must have shape (H, W, C).

generate_config.py

This script guides you through creating a JSON configuration for either training or prediction.

Usage

python generate_config.py

Training mode? Select y or n.
Model selection: Choose from available models in the registry.
(If training)
- Criterion selection
- Optimizer selection
- Scheduler selection
Configuration is saved under config/templates/train/ or config/templates/predict/ with a unique filename.

Generated config includes sections:

model: Model component and parameters
dataset_config: Paths, training flag, and mask subdirectory (if any)
wandb_config: Weights & Biases integration settings
(If training) criterion, optimizer, scheduler

main.py

Entrypoint to run training, testing, or prediction using a config file.

Command-line Arguments

python main.py [-c CONFIG] [-m {train,test,predict}] [--no-save-masks] [--only-masks]

-c, --config : Path to JSON config file (default: config/templates/train/...json).
-m, --mode : train, test, or predict (default: train).
--no-save-masks : Disable saving predicted masks.
--only-masks : Save only raw predicted masks (no visual overlays).

Workflow

Load config and verify mode consistency.
Initialize Weights & Biases if enabled.
Create CellSegmentator and dataloaders with appropriate transforms.
Print dataset info for the first batch.
Run training or inference (.run()).
Save model checkpoint and upload to W&B if in training mode.

Configurable Parameters

A brief overview of the key parameters you can adjust in your JSON config:

Common Settings (`common`)

seed (int): Random seed for data splitting and reproducibility (default: 0).
device (str): Compute device to use, e.g., 'cuda:0' or 'cpu' (default: 'cuda:0').
use_amp (bool): Enable Automatic Mixed Precision for faster training (default: false).
masks_subdir (str): Name of subdirectory under masks/ containing the instance masks (default: "").
predictions_dir (str): Output directory for saving predicted masks (default: ".").
pretrained_weights (str): Path to pretrained model weights (default: "").

Training Settings (`training`)

is_split (bool): Whether your data is already split (true) or needs splitting (false, default).
split / pre_split: Directories for data when pre-split or unsplit.
train_size, valid_size, test_size (int/float): Size or ratio of your splits (e.g., 0.7, 0.1, 0.2).
batch_size (int): Number of samples per training batch (default: 1).
num_epochs (int): Total training epochs (default: 100).
val_freq (int): Frequency (in epochs) to run validation (default: 1).

Testing Settings (`testing`)

test_dir (str): Directory containing test data (default: ".").
test_size (int/float): Portion or count of data for testing (default: 1.0).
shuffle (bool): Shuffle test data before evaluation (default: true).

Batch size note: Validation, testing, and prediction runs always use a batch size of 1, regardless of the batch_size setting in the training configuration.

Examples

Generate a training config

python generate_config.py
# Follow prompts to select model, criterion, optimizer, scheduler
# Output saved to config/templates/train/YourConfig.json

Train a model

python main.py -c config/templates/train/YourConfig.json -m train

Predict on new data

python main.py -c config/templates/predict/YourConfig.json -m predict

5.5 KiB Raw Blame History