Yuexin Ma, Xiang Li, Xiaohao Ji, Chunying Wang, Di Zhang, Tingting Zhai, Haibo Wang, Ping Liu
Abstract
Genomic selection (GS) is critical for accelerating genetic gain in modern plant breeding. Deep learning approaches offer powerful non-linear representation capabilities for modelling non-additive effects. However, their application in GS remains restricted, as high-dimensional, low-sample and noisy data hinder the identification of informative markers. The present study proposes DNAwhisper, a deep learning framework designed for multi-trait prediction and adaptive marker prioritisation. The framework integrates a cascaded architecture, GFIformer, employing shared network parameters across partitioned marker blocks to adaptively compress genetic features within a hierarchical pyramid. Pre-training on population genetic structure regularises feature learning to establish a generalisable latent representation. During trait modelling, importance scores for aggregated genomic regions at multi-resolutions are extracted from the distinct pyramid levels under trait-guided deep supervision, enhancing interpretability and supporting marker prioritisation. DNAwhisper was evaluated on maize, wheat, tomato and grape datasets for marker prioritisation and phenotypic prediction, achieving prediction accuracy approximately 3.0% to 10.0% higher than the baseline model. Furthermore, DNAwhisper identifies major QTLs (e.g., VGT1, ZCN8) and epistatic signals within the gibberellin metabolic pathway across maize flowering traits. This framework provides a new strategy for dissecting the genetic architecture of complex traits.
Paper Linkage:https://onlinelibrary.wiley.com/doi/10.1111/pbi.70619
Chinese