28/02/2022 | Hugo Gangloff (Université Bretagne Sud) | | 11h - salle ACTIA |
---|
| Deep pairwise and triplet Markov chains for unsupervised signal processing | | |
---|
| Résumé: Probabilistic graphical models such as hidden Markov models have found many applications in unsupervised signal processing, such as part-of-speech tagging, image segmentation, genetic sequence analysis, etc. In this presentation, we focus on the pairwise and triplet Markov chain models which define very general frameworks that have been very little explored so far. While pairwise Markov chains strictly extend the direct dependencies that can be introduced by the model, triplet Markov chains additionally enable the introduction of much more complex probability distributions. However, such generalizations raise the questions of the choice of the probability distributions, their parametrization and the unsupervised parameter estimation in the complex models that can be built. We will explore these questions and propose answers: i) we use an auxiliary latent process to implicitly define complex probability distributions, ii) the parametrization issue is considered by embedding deep neural networks in the new models and iii) a general algorithm, based on a lower bound of the loglikelihood, is derived in order to perform unsupervised parameter estimation in these sequential models,. We show that the new models outperform the hidden Markov chains and their classical extensions usually considered in the literature. | | |
---|
|
21/02/2022 | Ricardo Carrizo (Paris II Assas) | | 11h - salle ACTIA |
---|
| Development of a novel GWAS method for the detection of causal genes with population specific allelic effects | | |
---|
|
14/02/2022 | Kosuke Hamazaki (University of Tokyo) | | 11h - salle ACTIA |
---|
| Development of a novel GWAS method for the detection of causal genes with population specific allelic effects | | |
---|
| Résumé: GWAS (Genome-Wide Association Study) aims at detecting candidate genes (QTL) associated with a target trait via statistical testing. It is now widely used not only in humans but also in plant animal genetics and breeding. A classical GWAS starts with the constitution of a panel of individuals, usually gathered from different populations. This population structure needs to be accounted for in the analysis in order to efficiently control the Type I error rate of the QTL detection procedure. Many methods have been proposed to control the false positive error rate in large datasets with strong population structure; e.g., a mixed-effects model to correct the effects of family relatedness by regarding polygenetic effects as random effects. However, most methods assume the same QTL effect across populations, which is not always true in the real biological process. Rio et al. (PLOS Genetics, 2020) proposed a method to consider population-specific QTL effects by testing marker effects in each population separately with prior information on population membership for each individual. This information on the population structure, however, may not always be available, in which case the methodology of Rio et al. cannot be applied. We propose a novel method that does not require prior knowledge of the population structure. In the proposed model, we explicitly include an interaction term between a SNP of interest and the genetic background into the conventional GWAS model. The effect of the interaction term is also tested with the SNP effect of interest simultaneously. The proposed model, the SNPxGB model, can be justified because population-specific QTL effects can be regarded as the interaction between the QTLs and the genetic background. In this seminar, I will first illustrate the classical GWAS method with the mixed-effects model, including the influences of population structure on GWAS results. Then, I will introduce the model of Rio et al. with prior information on the population structure. Finally, I will explain the proposed model including the interaction term between SNPs and the genetic background. I will also introduce a model that combines the proposed model with a haplotype-based approach, the HBxGB model, in order to better control the Type I error rate compared to the proposed SNPxGB model. The different procedures will be compared on simulated data. | | |
---|
|
07/02/2022 | Edi Prifti (IRD) | | 11h - salle ACTIA |
---|
| Vers une médecine de précision basée sur l’intelligence artificielle et l’intégration de données massives et hétérogènes | | |
| Résumé: Les données biomédicales sont complexes et en croissance constante. Leurs traitement, intégration et modélisation constituent des défis importants. L’objectif de mon projet de recherche est de contribuer à la recherche méthodologique, translationnelle et applicative, notamment en lien avec les pays du Sud. Ce projet de recherche est structuré en trois axes : 1) Algorithmes de passage à l'échelle pour la modélisation des microbiomes, 2) Découvertes de biomarqueurs et développement de tests diagnostiques, 3) Démocratisation de l'Intelligence Artificielle (IA) et des dispositifs médicaux connexes. Le premier axe consiste à contribuer à l’ouverture des trois verrous suivants. Verrou 1 : La détermination et la gestion de catalogues métagenomique précis de constitution de génomes. Il s’agit d’un problème complexe à résoudre par sa complexité algorithmique, mais aussi computationnelle. Verrou 2 : La compréhension des relations entre les différents composants du microbiome et construction d’abstractions qui les représentent le mieux. Verrou 3 : Étude de la dynamique et équilibre de ces relations en explorant l’approche par simulation multiagent. Le deuxième axe met le focus sur le développement d’approches méthodologiques interprétables pour identifier des signatures prédictives du microbiome. Mais aussi des aspects plus terre-à-terre visant à déterminer des protocoles permettant de passer des signatures à des modelés concrets utilisables en clinique. Le troisième axe du projet se situe au cœur des applications d’IA qui intègrent des données issues de capteurs médicaux connexes avec des données cliniques ou environnementales. En collaboration avec des collègues cliniciens. Dans l’ensemble, à travers collaborations existantes et futures, je souhaite proposer des solutions à des problèmes qui sont en lien avec la demande sociétale dans la recherche-action au Sud. | | |
|
31/01/2022 | Achille Thin (CMAP Ecole Polytechnique) | | 11h - salle ACTIA |
---|
| Monte Carlo Variational Auto Encoders |
---|
| Résumé : Variational auto-encoders (VAE) are popular deep latent variable models which are trained by maximizing an Evidence Lower Bound (ELBO). To obtain tighter ELBO and hence better variational approximations, it has been proposed to use importance sampling to get a lower variance estimate of the evidence. However, importance sampling is known to perform poorly in high dimensions. While it has been suggested many times in the literature to use more sophisticated algorithms such as Annealed Importance Sampling (AIS) and its Sequential Importance Sampling (SIS) extensions, the potential benefits brought by these advanced techniques have never been realized for VAE: the AIS estimate cannot be easily differentiated, while SIS requires the specification of carefully chosen backward Markov kernels. In this work, we address both issues and demonstrate the performance of the resulting Monte Carlo VAEs on a variety of applications. |
---|
|
24/01/2022 | Marine Demangeot (LPSM, Sorbonne Université) | | 11h - salle ACTIA |
---|
| Estimation of the extremal coefficient function based on a single spatial observation |
---|
| Résumé : The extremal coefficient function is a bivariate measure of spatial dependence for stationary max-stable processes. It is usually estimated from time series, when the spatial object under study is observed through time (e.g. extreme precipitations, extreme temperatures, high concentrations of pollution in the air). However, in some cases, such types of data cannot be accessed: only one or just a few records are made available. This is the case, for instance, in mining resources estimation, soil contamination evaluation or any other applications where the phenomenon of interest either varies too slowly across time to hope for a decent time series, or is too expensive to sample from. This situation is rarely addressed in the spatial extremes community, contrary to geostatistics, which typically deals with such issues. A basic geostatistical tool is the so-called variogram, which is also a bivariate measure of spatial dependence. Considering the indicator variogram, above some threshold, of a stationary max-stable random field, we propose a new nonparametric estimator of the extremal coefficient function based on the variogram’s Nadaraya-Watson estimator. The latter has been studied by Garcia-Soidan et al. (2004) and Garcia-Soidan (2019); from their work, we derive asymptotic properties of our estimator when it is computed from a single spatial set of observations. Namely, under some assumptions, we show that it is consistent and asymptotically normal. These results are illustrated by numerical experiments and a comparison with the well-known F-madogram based estimator is performed. An application on a real dataset is also presented. |
---|
|
17/01/2022 | Fanny Mollandin (INRAE, GABI) | | 11h - salle ACTIA |
---|
| Accounting for complex overlapping annotations as biological priors in genomic prediction models of complex traits |
---|
| Résumé : The primary objective of genomic prediction is to use genomic variation, usually single nucleotide polymorphisms (SNPs), to predict complex phenotypes. In particular, genomic prediction models are widely used as an evaluation tool for genomic selection in plant and animal breeding. However, the prediction accuracies of many complex quantitative traits still have room for further improvement, due to factors such as marker density, the underlying genetic architecture, and population structure. Alongside this, there is an increasing accumulation of knowledge about the genome, including improved functional annotation and more widely available high-throughput molecular assays (i.e. omics data), providing a bridge from genome variation to phenotypes. Integrating this information into genomic prediction models could potentially lead to improved prediction accuracy and a better understanding of the underlying architecture of complex traits. Bayesian models provide a straightforward way to introduce known functional information into genomic prediction models through the use of prior distributions. In particular, BayesRC divides SNPs into disjoint annotation categories, allowing the proportion of QTLs to vary in each. Although BayesRC has shown promising results, it is limited by the non-overlapping nature of the annotations, which prevents SNPs from belonging to more than one functional list. As the number of potential annotation categories increases, this constraint will become a key limitation. To address this issue, we present two novel extensions of BayesRC to handle potentially overlapping annotations through either a cumulative or stochastic approach. Our approaches allow SNPs with multiple annotations to be respectively upweighted or preferentially assigned to the annotation that best characterizes them. We compare and evaluate these two proposed models with state-of-the-art Bayesian genomic prediction models on simulated and real data, with a simultaneous focus on prediction quality and QTL mapping accuracy. |
---|
|
10/01/2022 | Gilles Blanchard (LMO, Université Paris Saclay) | | 11h - salle ACTIA |
---|
| TBA |
---|
| Résumé : TBA |
---|
|
03/01/2022 | Mary Savino (MIA Paris, AgroParisTech) | | 11h - salle ACTIA |
---|
| Statistical learning methods for the simulation of highly nonlinear problems for geological porous media and materials |
---|
| Résumé : Computation simulation models using mathematical concepts and language are necessary to demonstrate the feasability of a project and in decision making processes. At Andra, (Agence Nationale pour la gestion des déchets radioactifs) simulation models are needed to prove the quality and the security of the Cigéo Project, a deep geological disposal facility for highly radioactive long-lived waste, before undertaking any construction work. However, these computational simulations can demand prohibitively large computational budgets and hinder fast decision-making. In this presentation, we propose a sequential data-driven method for dealing with equilibrium-based chemical simulations to build what is commonly called a surrogate model in order to circumvent these time-consuming and costly simulations. Our method is based on the idea that the function to estimate is a sample of a Gaussian Process (GP), which allows us to compute the global uncertainty of the function estimation. It can be seen as a specific machine learning (ML) approach called active learning, since it sequentially chooses the most relevant input data at which the function to estimate has to be evaluated. Our active learning method is first validated through numerical experiments based on halite precipitation and then applied to a complex chemical system commonly used in geoscience. This method is then applied to Reactive Transport Modeling. |
---|
| Erme Anakok (MIA Paris, AgroParisTech) | | 11h30 - salle ACTIA |
---|
| Magnetic and Mechanic Design Of a 14 Tesla MRI using Genetic Algorithms |
---|
| Résumé : Les algorithmes génétiques sont des procédures d’optimisation s’inspirant du vivant : mutations, croisements et sélections sont utilisés afin d’optimiser des fonctions dans des grands espaces d’états. La phase de sélection peut prendre en compte différentes contraintes, avec des formes variables, afin de résoudre des problèmes d'optimisation sous contraintes. Un exemple où l’on est confronté à des contraintes de formes variées est le design d’une bobine principale pour IRM à 14 T. Il s’agit d’un problème complexe faisant intervenir différentes types de physique comme le magnétisme ou la mécanique, et comportant aussi des obligations quant à l’usinabilité de la machine. En reformulant ce problème de design en un problème d’optimisation sous contraintes, les algorithmes génétiques ont permis d'ouvrir la voie à une nouvelle philosophie du design dans le domaine des aimants supraconducteurs : la possibilité d’inclure toutes les étapes du design dans un même processus d’optimisation. |
---|
|
27/12/2021 | Vacances de Noël | | |
---|
|
20/12/2021 | Vacances de Noël | | |
---|
|
13/12/2021 | Anne Sabourin (LTCI, Télécom Paris, IPP) | | 11h - salle ACTIA |
---|
| Tail inverse regression for dimension reduction with extreme response |
---|
|
06/12/2021 | Félix Cheysson (LPSM, Sorbonne Université) | | 11h - salle ACTIA |
---|
| Evolution of groups at risk of death from Covid-19 using hospital data |
---|
|
29/11/2021 | Anass Aghbalou (LTCI, Télécom Paris, IPP) | | 11h - salle ACTIA |
---|
| Validation croisée pour les événements rares |
---|
|
22/11/2021 | Perrine Lacroix (LMO, IPS2) | | 11h - salle ACTIA |
---|
| Compromis entre risque prédictif et false discovery rate pour la régression linéaire gaussienne en grande dimension |
---|
|
15/11/2021 | Clément Chadebec (Université de Paris, INRIA, INSERM) | | 11h - salle ACTIA |
---|
| Data Augmentation in High Dimensional Low Sample Size Setting with Geometry-Aware Variational Autoencoders |
---|
|
08/11/2021 | Baptiste Kerleguer (CMAP, Ecole polytechinque) | | 11h - salle ACTIA |
---|
|
01/11/2021 | Jour de la Toussaint | | |
---|
|
25/10/2021 | Vacances de la Toussaint | | |
---|
|
18/10/2021 | Liliane Bel (MIA Paris, AgroParisTech) | | 11h - salle ACTIA |
---|
| Variable selection for spatial models |
---|
|
11/10/2021 | Thanh Mai Pham Ngoc (LMO, Université Paris Saclay) | | 11h - salle ACTIA |
---|
| Adaptive estimation of nonparametric geometric graphs |
---|
|
04/10/2021 | Mathis Chagneux (MIA Paris, AgroParisTech) | | 11h - salle ACTIA |
---|
| Macrolitter video counting on river banks with state space models for moving cameras |
---|
|
27/09/2021 | State of The R | | 11h - salle ACTIA |
---|
| Retour de la semaine à Roscoff |
---|
|
20/09/2021 | Wencan Zhu (AgroParisTech & Sanofi) | | 11h - salle ACTIA |
---|
| A variable selection approach for highly correlated predictors in high-dimensional data |
---|
|
13/09/2021 | Tâm Le Minh (MIA Paris, AgroParisTech) | | 11h - salle ACTIA |
---|
| Comparaison de réseaux d'interaction écologiques au moyen de modèles probabilistes échangeables |
---|