The rapid growth of spatially resolved transcriptomics technology provides new perspectives on spatial tissue architecture. Deep learning has been widely applied to derive useful representations for spatial transcriptome analysis. However, effectively integrating spatial multi-modal data remains challenging. Here, we present ConGcR, a contrastive learning-based model for integrating gene expression, spatial location, and tissue morphology for data representation and spatial tissue …