About NuLocVar
Subcellular localization is fundamental to protein function, orchestrating intracellular processes and cellular responses to external signals. Genetic variants can disrupt protein nucleocytoplasmic shuttling, thereby driving disease progression and tumorigenesis Recently, we developed the pSAM model, which predicts protein nuclear localization with an AUC of 0.9865 on the test dataset. This model deciphers both canonical and non-canonical bona fide sequence determinants of nucleocytoplasmic shuttling, and its ability to prioritize cancer mutations that alter protein localization has been validated through extensive experiments (Nature Communications, 2025, 16:2511).
Here, we present NuLocVar—a comprehensive database integrating 12,703,874 nonsynonymous single nucleotide variants across the human proteome. The pSAM model identified 604,135 variants within sequence determinants for nuclear localization (DNL), far surpassing the 18,906 variants found within known NLS/NES motifs. Additionally, 75,415 variants were predicted to shift nuclear localization probability between Non-Nucleus and high/medium/low levels. NuLocVar also incorporates extensive functional and disease annotations, as well as subcellular localization information from multiple resources. Overall, NuLocVar serves as a useful data repository for exploring the consequences of genetic variation on nuclear localization.
12,703,874 Variants
Collecting all human nsSNVs from five databases.
pSAM Prediction
Predicting nuclear localization probability and site-specific contributions of proteins.
Various Annotation
Integrating annotations about variation, NES/NLS, and subcellular localization.
(1) Variant-inducedtransitions between nuclearlocalization levels
(2) Number of DNL/NES/NLS and number of variants occurring within these determinants
Variant data used in NuLocVar
| Resource | Description | URL |
|---|---|---|
| OncoKB | A precision oncology knowledge base | http://oncokb.org/ |
| dbNSFP | Database for nonsynonymous SNPs' functional predictions | https://www.dbnsfp.org/ |
| GWAS | Human genome-wide association studies | https://www.ebi.ac.uk/gwas/ |
| COSMIC | Catalogue of Somatic Mutations in Cancer | https://cancer.sanger.ac.uk/cosmic/ |
| ClinVar | Genomic variation and its relationship to human health | https://www.ncbi.nlm.nih.gov/clinvar/ |
Annotation sources used in NuLocVar
| Data type | Resource | Description | URL |
|---|---|---|---|
| Basic information | UniProt | Universal protein resource | https://www.uniprot.org/ |
| Disease and functional annotation | OncoKB | A precision oncology knowledge base | http://oncokb.org/ |
| dbNSFP | Database for nonsynonymous SNPs' functional predictions | https://www.dbnsfp.org/ | |
| GWAS | Human genome-wide association studies | https://www.ebi.ac.uk/gwas/ | |
| COSMIC | Catalogue of Somatic Mutations in Cancer | https://cancer.sanger.ac.uk/cosmic/ | |
| ClinVar | Genomic variation and its relationship to human health | https://www.ncbi.nlm.nih.gov/clinvar/ | |
| Experimental NLS/NES region | SeqNLS | Nuclear localization signal prediction | http://mleg.cse.sc.edu/seqNLS/ |
| ValidNESs | Validated NES-containing proteins, functional NES sites and NES predictions | http://validness.ym.edu.tw/ | |
| NESbase | Nuclear export signal database | https://www.nesbase.org/ | |
| UniProt | Universal protein resource | https://www.uniprot.org/ | |
| Predicted NLS/NES region | NLSdb | Nuclear localization signal database | https://service.rostlab.org/nlsdb/ |
| Experimental subcellular localization | UniProt | Universal protein resource | https://www.uniprot.org/ |
| Compartments | Subcellular localization database | https://compartment.jensenlab.org/ | |
| Predicted subcellular localization | Compartments | Subcellular localization database | https://compartment.jensenlab.org/ |
| Translocatome | Predicted translocating proteins from human cells | http://translocatome.linkgroup.hu | |
| Post translational modification annotation | qPTM | Quantification of post-translational modifications | https://qptm.omicsbio.info/ |
| 3D structure | PDB | Protein data bank | https://www.rcsb.org/ |
| Physical and chemical properties of amino acids | AAindex | Amino acid index database | https://www.genome.jp/aaindex/ |
Tools used in NuLocVar
| Tools | Description | URL |
|---|---|---|
| pSAM | A deep learning model for predicting nuclear localization probability and site-specific contributions of proteins | https://github.com/lzxlab/pSAM |
| IUPred | A tool that predicts the tendency of amino acids to be in disordered regions based on energy estimation and experimental annotation | https://iupred1.elte.hu/ |
| NetSurfP | A tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequence | https://services.healthtech.dtu.dk/services/NetSurfP-1.0/ |
Data Collection
Collection of human nonsynonymous single nucleotide variants (nsSNVs) from ClinVar, COSMIC, OncoKB, dbNSFP, and GWAS
pSAM Prediction
Prediction of nuclear localization probability and determinants for nuclear localization (DNL) induced by nsSNVs
Variant Annotation
Integration of disease and functional annotation from OncoKB, dbNSFP, GWAS, COSMIC, and ClinVar
Localization Annotation
Integration of experimental and predicted subcellular localization data from UniProt, NLSdb, SeqNLS, ValidNESs, NESbase, Compartments, and Translocatome
