SPiP: Splicing Prediction Pipeline, a machine learning tool for massive detection of exonic and intronic variant effects on mRNA splicing

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Dokumenter

  • Fulltext

    Forlagets udgivne version, 2,28 MB, PDF-dokument

  • Raphaël Leman
  • Béatrice Parfait
  • Dominique Vidaud
  • Emmanuelle Girodon
  • Laurence Pacot
  • Gérald Le Gac
  • Chandran Ka
  • Claude Ferec
  • Yann Fichou
  • Céline Quesnelle
  • Camille Aucouturier
  • Etienne Muller
  • Dominique Vaur
  • Laurent Castera
  • Flavie Boulouard
  • Agathe Ricou
  • Hélène Tubeuf
  • Omar Soukarieh
  • Pascaline Gaildrat
  • Florence Riant
  • Marine Guillaud-Bataille
  • Sandrine M. Caputo
  • Virginie Caux-Moncoutier
  • Nadia Boutry-Kryza
  • Françoise Bonnet-Dorion
  • Ines Schultz
  • Olivier Quenez
  • Louis Goldenberg
  • Valentin Harter
  • Michael T. Parsons
  • Amanda B. Spurdle
  • Thierry Frébourg
  • Alexandra Martins
  • Claude Houdayer
  • Sophie Krieger

Modeling splicing is essential for tackling the challenge of variant interpretation as each nucleotide variation can be pathogenic by affecting pre-mRNA splicing via disruption/creation of splicing motifs such as 5′/3′ splice sites, branch sites, or splicing regulatory elements. Unfortunately, most in silico tools focus on a specific type of splicing motif, which is why we developed the Splicing Prediction Pipeline (SPiP) to perform, in one single bioinformatic analysis based on a machine learning approach, a comprehensive assessment of the variant effect on different splicing motifs. We gathered a curated set of 4616 variants scattered all along the sequence of 227 genes, with their corresponding splicing studies. The Bayesian analysis provided us with the number of control variants, that is, variants without impact on splicing, to mimic the deluge of variants from high-throughput sequencing data. Results show that SPiP can deal with the diversity of splicing alterations, with 83.13% sensitivity and 99% specificity to detect spliceogenic variants. Overall performance as measured by area under the receiving operator curve was 0.986, better than SpliceAI and SQUIRLS (0.965 and 0.766) for the same data set. SPiP lends itself to a unique suite for comprehensive prediction of spliceogenicity in the genomic medicine era. SPiP is available at: https://sourceforge.net/projects/splicing-prediction-pipeline/.

OriginalsprogEngelsk
TidsskriftHuman Mutation
Vol/bind43
Udgave nummer12
Sider (fra-til)2308-2323
Antal sider16
ISSN1059-7794
DOI
StatusUdgivet - 2022

Bibliografisk note

Funding Information:
We wish to thank the Unicancer Genetic Group, subgroup splicing members. We thank the participation of laboratoire de biologie et génétique du cancer (Centre François Baclesse, Caen, France) members. We are grateful to the French Fondation de France (200412859), the Institut National du Cancer/Direction Générale de l'Offre de Soins (INCA/DGOS, AAP/CFB/CI), the Cancéropôle Nord‐Ouest (CNO), the Groupement des Entreprises Françaises dans la Lutte contre le Cancer (Gefluc, # R18064EE), and the OpenHealth Institute for supporting this work. R. L. was cosupported by the Fédération Hospitalo‐Universitaire (FHU), H. T. was funded by a CIFRE PhD fellowship (#2015/0335) from the French Association Nationale de la Recherche et de la Technologie (ANRT) in the context of a public–private partnership between INSERM and Interactive Biosoftware, and ABS is supported by an NHMRC Senior Research Fellowship (ID1061779). M. T. P. is supported in part by funding from The Cancer Council Queensland (ID1086286). S. M. C. is supported by the Institut National du Cancer (INCa). The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. We would like to thank the FHU‐G4 genomics group for supporting this work. We thank Dr. Diana Baralle for her valuable advices.

Funding Information:
We wish to thank the Unicancer Genetic Group, subgroup splicing members. We thank the participation of laboratoire de biologie et génétique du cancer (Centre François Baclesse, Caen, France) members. We are grateful to the French Fondation de France (200412859), the Institut National du Cancer/Direction Générale de l'Offre de Soins (INCA/DGOS, AAP/CFB/CI), the Cancéropôle Nord-Ouest (CNO), the Groupement des Entreprises Françaises dans la Lutte contre le Cancer (Gefluc, # R18064EE), and the OpenHealth Institute for supporting this work. R. L. was cosupported by the Fédération Hospitalo-Universitaire (FHU), H. T. was funded by a CIFRE PhD fellowship (#2015/0335) from the French Association Nationale de la Recherche et de la Technologie (ANRT) in the context of a public–private partnership between INSERM and Interactive Biosoftware, and ABS is supported by an NHMRC Senior Research Fellowship (ID1061779). M. T. P. is supported in part by funding from The Cancer Council Queensland (ID1086286). S. M. C. is supported by the Institut National du Cancer (INCa). The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript. We would like to thank the FHU-G4 genomics group for supporting this work. We thank Dr. Diana Baralle for her valuable advices.

Publisher Copyright:
© 2022 The Authors. Human Mutation published by Wiley Periodicals LLC.

ID: 346412397