| | |

Exploring Foundation Models Fine-Tuning for Cytology Tasks

Implementation of “Exploring Foundation Models Fine-Tuning for Cytology Tasks”.

In this paper, we explore the application of existing foundation models to cytological classification tasks, focusing on low-rank adaptation (LoRA), a parameter-efficient fine-tuning method well-suited to few-shot learning scenarios. We evaluate five foundation models across four cytological classification datasets. Our results demonstrate that fine-tuning the pre-trained backbones with LoRA significantly enhances model performance compared to merely fine-tuning the classifier head, achieving state-of-the-art results on both simple and complex classification tasks while requiring fewer data samples.

Authors: M. Dausort, T. Godelaine, M. Zanella, K. El Khoury, I. Salmon, B. Macq

NB: This GitHub repository is based on the implementation of CLIP-LoRA.

Contents

Installation

NB: The Python version used is 3.9.13.

  1. Create a virtual environment, clone the GitHub repository, and install the required packages:
   python3 -m venv cyto_ft_venv
   source cyto_ft_venv/bin/activate
   pip3 install torch==2.2.2 torchaudio==2.2.2 torchvision==0.17.2
   git clone https://github.com/mdausort/Cytology-fine-tuning.git
   cd Cytology-fine-tuning
   pip3 install -r requirements.txt
  1. Download datasets:

Each dataset must be divided into three folders: train, val and test. Images were named following this structure: classname_number.

Important: All file paths in scripts are set with the placeholder “TO CHANGE”. You will need to search for this placeholder in the cloned repository’s files and replace it with the appropriate path /root/path/ as specified for your system. In this setup, we have placed the different datasets inside a folder named ./data.

Usage

To launch the experiments, use the provided launch_run.sh bash script:

  1. Open the relevant script and locate the required line for configuration (e.g., line 28 for Experiment 1). Uncomment this line to enable the specific settings needed for the experiment.
  2. Start the experiment by executing the launch_run.sh script:
   bash launch_run.sh
  1. To visualize the changes and track the experiment’s progress, you must integrate your code with Weights & Biases. Add the following line to your script if it’s not already included:
   wandb.init(project='your_project_name')

You can view the results and metrics of your experiment on Weights & Biases.

  1. The results of the experiment are also saved into a JSON file for further analysis or documentation.

Experiment 1: Linear Classifier

python3 main.py --root_path ./data/ \
                --dataset {dataset} \
                --seed {seed} \
                --shots -1 \
                --lr {lr} \
                --n_iters 50 \
                --model_name {model_name} \
                --num_classes {num_classes} \
                --level {level} \
                --textual False

Experiment 2: LoRA Few-Shot Adaptation

python3 main_lora.py --root_path ./data/ \
                     --dataset {dataset} \
                     --seed {seed} \
                     --shots {shots} \
                     --lr {lr} \
                     --n_iters 50 \
                     --position "all" \
                     --encoder "vision" \
                     --params "q v" \
                     --r 2 \
                     --model_name {model_name} \
                     --num_classes {num_classes} \
                     --level {level} \

Experiment 3: Pushing Model Fine-Tuning Limits

python3 main_lora.py --root_path ./data/ \
                     --dataset hicervix \
                     --seed {seed} \
                     --shots 0 \
                     --lr 1e-3 \
                     --n_iters 100 \
                     --position "all" \
                     --encoder "vision" \
                     --pourcentage {pourcentage} \
                     --params "q k v o" \
                     --r 16 \
                     --model_name clip \
                     --level level_3 \

Contact

If you have any questions, you can contact us by email: manon.dausort@uclouvain.be, tiffanie.godelaine@uclouvain.be

Similar Posts

  • | | | | |

    OpenTPS

    OpenTPS is an open-source treatment planning system (TPS) for research in radiation therapy and proton therapy. It was developed in Python with a special focus on simplifying contribution to the core functions to let the user develop their own features. It contains a variety of treatment planification and evaluation methods, as well as image processing and…

  • | | |

    Orthanc

    Orthanc est un écosystème libre et open-source destiné à la gestion et au partage d’images médicales. Orthanc implémente le standard international DICOM qui régule l’imagerie médicale numérique dans tout établissement de santé. Cela permet à Orthanc de recevoir, de stocker et de transmettre des images en provenance de n’importe quel équipement de radiologie (scanners, IRM,…

  • | | | | | |

    MCsquare

    The use of Monte Carlo dose calculation instead of typical analytical algorithms can improve the accuracy of proton therapy treatment planning. Especially, range uncertainties are significantly reduced in heterogeneous anatomies. MCsquare, a new fast Monte Carlo code, has been developed to simulate proton PBS treatments with the accuracy and calculation speed required in the clinic….

  • |

    LiblineaR

    LiblineaR is an R package for large-scale linear modeling  supporting classification and regression of large datasets. The original software in C/C++ was developed by Prof. Chih-Jen Lin and his team at the Machine Learning Group of the Taiwan University. As most of our developments are done in the open source R language, we have developed the R library LiblineaR, making all the…

  • | | | | |

    Parrot

    PARROT, which stands for Platform for ARtificial intelligence guided Radiation Oncology Treatment, is a user-friendly, free, and open-source web platform. It allows users to visualize DICOM files, run AI models, display and evaluate predictions easily. The platform includes several trained state-of-the-art dose prediction and contour segmentation models. Users can also add their own models using…

  • | | |

    BIDS Managing and Analysis Tool

    > Github repository
    The BMAT software is a complete and easy-to-use local open-source neuroimaging analysis tool with a graphical user interface (GUI) that uses the BIDS format to organize and process brain MRI data for MS imaging research studies. BMAT provides the possibility to translate data from MRI scanners to the BIDS structure, create and manage BIDS datasets as well as develop and run automated processing pipelines.

    BMAT is now compatible to work with remote server using shared samba folder and a slurm scheduler to process data on remote server. It has to be noted that this feature has been implemented for users based in the Institute of NeuroSciences (IoNS) from UCLouvain. Therefore, it may not work easily with every servers, but feel free to fork the code and adapt it for your institue.

Laisser un commentaire