EnDeep4mC WebServer

Predictor

ℹ️ Variable-Length Sequence Processing

EnDeep4mC can process DNA sequences of any length (20-100,000 bp) using the following pipeline:

1. Length Standardization

All sequences are standardized to 41bp windows through appropriate padding or segmentation

2. Cytosine Filtering

Only windows with cytosine (C) at the center position are analyzed for 4mC prediction

3. Sliding Window Analysis

Long sequences are analyzed using overlapping 41bp windows with 1bp step size

4. Ensemble Prediction

Each 41bp window is analyzed by CNN, BLSTM, and Transformer ensemble

Note: For sequences longer than 41bp, predictions are averaged per cytosine position to provide robust methylation likelihood scores.

Model Architecture Details

Table 1. Architecture and Hyperparameters of Base Deep Learning Models

Component	CNN	Bi-LSTM	Transformer
Input	(1, feature_dim)	(1, feature_dim)	(None, feature_dim)
Layer 1	Conv1D(256,1)+BN	BiLSTM(128)	MultiHead(8 heads)
Layer 2	SepConv1D(128,3)+Pool	BN	FFN(512)+LayerNorm
Layer 3	Conv1D(64,1)	BiLSTM(64)	2 encoder layers
Pooling	GlobalMaxPool	-	GlobalAvgPool
Dense	Dense(128)	Dense(64)	Dense(128)
Output	Dense(1, sigmoid)	Dense(1, sigmoid)	Dense(1, sigmoid)
Regularization	L2(0.001), Drop(0.3)	L2(0.001), Drop(0.2), RecDrop(0.1)	L2(0.001), Drop(0.1)
Optimizer	Adam(lr=0.001, clip=1.0)	Adam(lr=0.001, clip=1.0)	Adam(lr=0.001)

Table 2. Configuration of Ensemble Learning Framework

Component	XGBoost Configuration	LightGBM Configuration	Meta-Learner Configuration
Model Type	XGBClassifier	LGBMClassifier	LogisticRegression
Number of Trees	n_estimators=500	n_estimators=300	-
Learning Rate	0.05	0.05	-
Depth	max_depth=7	num_leaves=63	-
Regularization	gamma=0.1, subsample=0.8	reg_alpha=0.2, reg_lambda=0.2	C=0.6, l1_ratio=0.5
Others	colsample_bytree=0.8	min_child_samples=20	penalty='elasticnet', solver='saga'

Model Architecture Diagram

Integrated Deep Learning Architecture with Dual-Adaptive Encoding System

Example FASTA Format

>Sample_Sequence_1 (41bp)
ATGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGC
>Sample_Sequence_2 (50bp)
CGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGATCGA
>Sample_Sequence_3 (30bp)
ATCGATCGATCGATCGATCGATCGATCGAT

Variable-Length Examples: The webserver accepts sequences of any length (20-100,000 bp). Short sequences will be standardized to 41bp, and long sequences will be analyzed using sliding windows.

⬇️ Download Test Samples