I recently graduated with a B.Sc. in Chemistry and Biological Chemistry (Honors, Distinction) from Nanyang Technological University (NTU), Singapore, and I'm now working as a Research Assistant there.
Over the past few years, I've moved across the spectrum of bioinformatics. From studying epigenomic marks, to questioning the fidelity of transcriptomic platform differences, and currently thinking about how protein language models might be redesigned to reason in motifs rather than individual amino acids. The thread connecting all of it is a curiosity about what's conserved, what varies, and what that tells us.
Along the way I've also picked up a Silver at the ST Engineering Hackathon and participated in a few other mini projects, mostly to procrastinate dabble in areas outside my usual domain.
Journal of Bioinformatics and Computational Biology, World Scientific
DOI →A timeline of the questions I've been asking (roughly in chronological order).
Centre for Biomedical Informatics, Lee Kong Chian School of Medicine, NTU · Prof Bernett Lee
How different are epigenomic marks across cell types?
I started by asking whether epigenomic marks — specifically H3K4me3 and DNase — are truly distinct across cell types. The answer turned out to be surprisingly counterintuitive: many cell types share a large number of accessible sites for transcription. Since the presence of these marks alone isn't sufficient to distinguish cell identity, I investigated whether the extent of overlaps could be useful. Genes are then classified according to level of overlap, and we could identify genes which are more telling of cell identity versus those that are more generic. This led to a first-author journal publication and an oral presentation at GIW ISCB-Asia 2023.
Computational Biology Group, Nanjing University · Prof Dijun Chen
How much transcriptional noise is too much — and how would we even know?
Single-cell sequencing captures snapshots of individual cells, but each cell only samples a fraction of its transcriptome. The data is inherently sparse and noisy. I explored whether variational autoencoders could denoise these profiles without erasing genuine biological signal. I also investigated how to evaluate the quality of what's recovered. The harder (unanswered) question turned out to be: when does "denoising" become distortion?
Computational Biology & Omics Lab, Bioinformatics Institute, A*STAR · Prof Kumar Selvarajoo
Are pseudo-bulk profiles throughly equivalent to bulk RNA-seq? What do we lose in this assumption?
Pseudo-bulk RNA-seq — where single-cell counts are aggregated per sample — is increasingly used as a proxy for bulk data. But I found that one can reliably tell the two modality apart simply through a PCA, which suggests the aggregation process introduces systematic differences. This raises uncomfortable questions about reproducibility: are findings from one platform truly comparable to the other? It's a subtler problem than it looks, and the answer has implications for how we interpret most published transcriptomic studies.
School of CCEB, NTU · Prof Meng How Tan
Are conserved motifs useful for de novo protein design?
Evolution doesn't tinker at random — it conserves structural and functional motifs across billions of years. Our team asked whether these conserved motifs could serve as atomic units for de novo protein design: not generating amino acids one by one, but composing proteins from a vocabulary of known functional units. We generated de novo Cas13 variants, then validated their activities in the wet lab. The knockdown efficacy showed comparable behaviors to the wildtype, a sign that the conserved motifs have some biological relevance.
School of CCEB, NTU · Prof Dan He
Can a protein language model built on motifs be more interpretable — and more correct?
Building on the iGEM work, I'm developing a motif-level language model that treats functional protein motifs as its vocabulary rather than individual residues. The hypothesis is that this is closer to how evolution actually encodes function — and may produce more biologically grounded outputs. Current work focuses on metalloproteins, where the tight constraints around metal-binding sites make motif conservation especially pronounced and testable.
I'm always open to research collaborations, opportunities, or just a good conversation about biology and data. Feel free to reach out.