About NuLocVar

Introduction

Subcellular localization is fundamental to protein function, orchestrating intracellular processes and cellular responses to external signals. Genetic variants can disrupt protein nucleocytoplasmic shuttling, thereby driving disease progression and tumorigenesis Recently, we developed the pSAM model, which predicts protein nuclear localization with an AUC of 0.9865 on the test dataset. This model deciphers both canonical and non-canonical bona fide sequence determinants of nucleocytoplasmic shuttling, and its ability to prioritize cancer mutations that alter protein localization has been validated through extensive experiments (Nature Communications, 2025, 16:2511).

Here, we present NuLocVar—a comprehensive database integrating 12,703,874 nonsynonymous single nucleotide variants across the human proteome. The pSAM model identified 604,135 variants within sequence determinants for nuclear localization (DNL), far surpassing the 18,906 variants found within known NLS/NES motifs. Additionally, 75,415 variants were predicted to shift nuclear localization probability between Non-Nucleus and high/medium/low levels. NuLocVar also incorporates extensive functional and disease annotations, as well as subcellular localization information from multiple resources. Overall, NuLocVar serves as a useful data repository for exploring the consequences of genetic variation on nuclear localization.

12,703,874 Variants

Collecting all human nsSNVs from five databases.

pSAM Prediction

Predicting nuclear localization probability and site-specific contributions of proteins.

Various Annotation

Integrating annotations about variation, NES/NLS, and subcellular localization.

Browse

(1) Variant-inducedtransitions between nuclearlocalization levels

(2) Number of DNL/NES/NLS and number of variants occurring within these determinants

Data Sources

Variant data used in NuLocVar

ResourceDescriptionURL
OncoKBA precision oncology knowledge basehttp://oncokb.org/
dbNSFPDatabase for nonsynonymous SNPs' functional predictionshttps://www.dbnsfp.org/
GWASHuman genome-wide association studieshttps://www.ebi.ac.uk/gwas/
COSMICCatalogue of Somatic Mutations in Cancerhttps://cancer.sanger.ac.uk/cosmic/
ClinVarGenomic variation and its relationship to human healthhttps://www.ncbi.nlm.nih.gov/clinvar/

Annotation sources used in NuLocVar

Data typeResourceDescriptionURL
Basic informationUniProtUniversal protein resourcehttps://www.uniprot.org/
Disease and functional annotation OncoKBA precision oncology knowledge basehttp://oncokb.org/
dbNSFPDatabase for nonsynonymous SNPs' functional predictionshttps://www.dbnsfp.org/
GWASHuman genome-wide association studieshttps://www.ebi.ac.uk/gwas/
COSMICCatalogue of Somatic Mutations in Cancerhttps://cancer.sanger.ac.uk/cosmic/
ClinVarGenomic variation and its relationship to human healthhttps://www.ncbi.nlm.nih.gov/clinvar/
Experimental NLS/NES region SeqNLSNuclear localization signal predictionhttp://mleg.cse.sc.edu/seqNLS/
ValidNESsValidated NES-containing proteins, functional NES sites and NES predictionshttp://validness.ym.edu.tw/
NESbaseNuclear export signal databasehttps://www.nesbase.org/
UniProtUniversal protein resourcehttps://www.uniprot.org/
Predicted NLS/NES regionNLSdbNuclear localization signal databasehttps://service.rostlab.org/nlsdb/
Experimental subcellular localizationUniProtUniversal protein resourcehttps://www.uniprot.org/
CompartmentsSubcellular localization databasehttps://compartment.jensenlab.org/
Predicted subcellular localization CompartmentsSubcellular localization databasehttps://compartment.jensenlab.org/
TranslocatomePredicted translocating proteins from human cellshttp://translocatome.linkgroup.hu
Post translational modification annotationqPTMQuantification of post-translational modificationshttps://qptm.omicsbio.info/
3D structurePDBProtein data bankhttps://www.rcsb.org/
Physical and chemical properties of amino acidsAAindexAmino acid index databasehttps://www.genome.jp/aaindex/

Tools used in NuLocVar

ToolsDescriptionURL
pSAMA deep learning model for predicting nuclear localization probability and site-specific contributions of proteinshttps://github.com/lzxlab/pSAM
IUPredA tool that predicts the tendency of amino acids to be in disordered regions based on energy estimation and experimental annotationhttps://iupred1.elte.hu/
NetSurfPA tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequencehttps://services.healthtech.dtu.dk/services/NetSurfP-1.0/
Pipeline Construction
1
Data Collection

Collection of human nonsynonymous single nucleotide variants (nsSNVs) from ClinVar, COSMIC, OncoKB, dbNSFP, and GWAS

2
pSAM Prediction

Prediction of nuclear localization probability and determinants for nuclear localization (DNL) induced by nsSNVs

3
Variant Annotation

Integration of disease and functional annotation from OncoKB, dbNSFP, GWAS, COSMIC, and ClinVar

4
Localization Annotation

Integration of experimental and predicted subcellular localization data from UniProt, NLSdb, SeqNLS, ValidNESs, NESbase, Compartments, and Translocatome

About author
This study was performed by Jia-min Hu, Jun Wu and Ze-Xian Liu,

Jia-min Hu, Jun Wu and Ze-Xian Liu are from

Sun Yat-sen University Cancer Center,

Building 2#, 651 Dongfeng East Road,

Guangzhou 510060, P. R. China


Email: liuzx AT sysucc.org.cn

Tel/Fax: +86-20-87342025