Boon How Low

About

I recently graduated with a B.Sc. in Chemistry and Biological Chemistry (Honors, Distinction) from Nanyang Technological University (NTU), Singapore, and I'm now working as a Research Assistant there.

Over the past few years, I've moved across the spectrum of bioinformatics. From studying epigenomic marks, to questioning the fidelity of transcriptomic platform differences, and currently thinking about how protein language models might be redesigned to reason in motifs rather than individual amino acids. The thread connecting all of it is a curiosity about what's conserved, what varies, and what that tells us.

Along the way I've also picked up a Silver at the ST Engineering Hackathon and participated in a few other mini projects, mostly to ~~procrastinate~~ dabble in areas outside my usual domain.

Things I think about

Conserved signals in biology
Generative models for biological sequences
Multi-omics data fidelity
Transcriptomics & epigenomics

Research Journey

A timeline of the questions I've been asking (roughly in chronological order).

2022 - 2024

Research Intern

Centre for Biomedical Informatics, Lee Kong Chian School of Medicine, NTU · Prof Bernett Lee

How different are epigenomic marks across cell types?

I started by asking whether epigenomic marks — specifically H3K4me3 and DNase — are truly distinct across cell types. The answer turned out to be surprisingly counterintuitive: many cell types share a large number of accessible sites for transcription. Since the presence of these marks alone isn't sufficient to distinguish cell identity, I investigated whether the extent of overlaps could be useful. Genes are then classified according to level of overlap, and we could identify genes which are more telling of cell identity versus those that are more generic. This led to a first-author journal publication and an oral presentation at GIW ISCB-Asia 2023.

Epigenomics Chromatin Accessibility Cell Identity Transcriptomics Differential Expression

Jul - Aug 2024

Global Research Immersion Program (GRIPS) Poster Distinction

Computational Biology Group, Nanjing University · Prof Dijun Chen

How much transcriptional noise is too much — and how would we even know?

Single-cell sequencing captures snapshots of individual cells, but each cell only samples a fraction of its transcriptome. The data is inherently sparse and noisy. I explored whether variational autoencoders could denoise these profiles without erasing genuine biological signal. I also investigated how to evaluate the quality of what's recovered. The harder (unanswered) question turned out to be: when does "denoising" become distortion?

scRNA-seq Single-cell Biology Generative Models Data Quality

Aug 2024 - Jun 2025

Research Intern A*STAR Award

Computational Biology & Omics Lab, Bioinformatics Institute, A*STAR · Prof Kumar Selvarajoo

Are pseudo-bulk profiles throughly equivalent to bulk RNA-seq? What do we lose in this assumption?

Pseudo-bulk RNA-seq — where single-cell counts are aggregated per sample — is increasingly used as a proxy for bulk data. But I found that one can reliably tell the two modality apart simply through a PCA, which suggests the aggregation process introduces systematic differences. This raises uncomfortable questions about reproducibility: are findings from one platform truly comparable to the other? It's a subtler problem than it looks, and the answer has implications for how we interpret most published transcriptomic studies.

Bulk RNA-seq Pseudo-bulk Platform Fidelity Generative Models Synthetic Data

Jun - Oct 2025

iGEM 2025 — Computational Team Lead Gold Medal

School of CCEB, NTU · Prof Meng How Tan

Are conserved motifs useful for de novo protein design?

Evolution doesn't tinker at random — it conserves structural and functional motifs across billions of years. Our team asked whether these conserved motifs could serve as atomic units for de novo protein design: not generating amino acids one by one, but composing proteins from a vocabulary of known functional units. We generated de novo Cas13 variants, then validated their activities in the wet lab. The knockdown efficacy showed comparable behaviors to the wildtype, a sign that the conserved motifs have some biological relevance.

Protein Design Conserved Motifs de novo Proteins Protein Language Models Wet Lab Validation

Aug 2025 - Present

Research Assistant Current

School of CCEB, NTU · Prof Dan He

Can a protein language model built on motifs be more interpretable — and more correct?

Building on the iGEM work, I'm developing a motif-level language model that treats functional protein motifs as its vocabulary rather than individual residues. The hypothesis is that this is closer to how evolution actually encodes function — and may produce more biologically grounded outputs. Current work focuses on metalloproteins, where the tight constraints around metal-binding sites make motif conservation especially pronounced and testable.

Motif Grammar Protein Generation Metalloproteins Interpretability AlphaFold3

Boon How Low

About

Things I think about

Recent News

Publications

Machine learning differentiates between bulk and pseudo-bulk RNA-seq datasets

Cross-cellular analysis of chromatin accessibility markers H3K4me3 and DNase in the context of detecting cell-identity genes: an "all-or-nothing" approach