Flow cytometry#
Flow cytometry is a technique used to detect and measure physical and chemical characteristics of a population of cells or particles (wiki).
Here, we’ll walk through how to
iteratively ingest datasets
query, search, integrate & analyze datasets
Setup#
!lamin init --storage ./test-flow --schema bionty
Show code cell output
✅ saved: User(id='DzTjkKse', handle='testuser1', email='testuser1@lamin.ai', name='Test User1', updated_at=2023-09-26 10:02:21)
✅ saved: Storage(id='oieE0n5L', root='/home/runner/work/lamin-usecases/lamin-usecases/docs/test-flow', type='local', updated_at=2023-09-26 10:02:21, created_by_id='DzTjkKse')
💡 loaded instance: testuser1/test-flow
💡 did not register local instance on hub (if you want, call `lamin register`)
import lamindb as ln
import lnschema_bionty as lb
import readfcs
lb.settings.species = "human"
💡 loaded instance: testuser1/test-flow (lamindb 0.54.1)
ln.track()
💡 notebook imports: lamindb==0.54.1 lnschema_bionty==0.31.2 pytometry==0.1.4 readfcs==1.1.6 scanpy==1.9.5
💡 Transform(id='OWuTtS4SAponz8', name='Flow cytometry', short_name='facs', version='0', type=notebook, updated_at=2023-09-26 10:02:23, created_by_id='DzTjkKse')
💡 Run(id='HDsxREvsdSEFMCUgXjYW', run_at=2023-09-26 10:02:23, transform_id='OWuTtS4SAponz8', created_by_id='DzTjkKse')
Ingest a first file#
Access #
We start with a flow cytometry file from Alpert et al., Nat. Med. (2019).
Calling the following function downloads the file and pre-populates a few relevant registries:
ln.dev.datasets.file_fcs_alpert19(populate_registries=True)
PosixPath('Alpert19.fcs')
We use readfcs to read the raw fcs file into memory:
adata = readfcs.read("Alpert19.fcs")
adata
AnnData object with n_obs × n_vars = 166537 × 40
var: 'n', 'channel', 'marker', '$PnB', '$PnE', '$PnR'
uns: 'meta'
Transform: normalize #
In this use case, we’d like to ingest & store curated data, and hence, we split signal and normalize using the pytometry package.
import pytometry as pm
2023-09-26 10:02:27,975:INFO - Failed to extract font properties from /usr/share/fonts/truetype/noto/NotoColorEmoji.ttf: In FT2Font: Can not load face (unknown file format; error code 0x2)
2023-09-26 10:02:28,081:INFO - generated new fontManager
pm.pp.split_signal(adata, var_key="channel")
'area' is not in adata.var['signal_type']. Return all.
pm.tl.normalize_arcsinh(adata, cofactor=150)
Validate: cell markers #
First, we validate features in .var
using CellMarker
:
validated = lb.CellMarker.validate(adata.var.index)
❗ 13 terms (32.50%) are not validated for name: Time, Cell_length, Dead, (Ba138)Dd, Bead, CD19, CD4, IgD, CD11b, CD14, CCR6, CCR7, PD-1
We see that many features aren’t validated because they’re not standardized.
Hence, let’s standardize feature names & validate again:
adata.var.index = lb.CellMarker.standardize(adata.var.index)
validated = lb.CellMarker.validate(adata.var.index)
❗ 5 terms (12.50%) are not validated for name: Time, Cell_length, Dead, (Ba138)Dd, Bead
The remaining non-validated features don’t appear to be cell markers but rather metadata features.
Let’s move them into adata.obs
:
adata.obs = adata[:, ~validated].to_df()
adata = adata[:, validated].copy()
Now we have a clean panel of 35 validated cell markers:
validated = lb.CellMarker.validate(adata.var.index)
assert all(validated) # all markers are validated
Register: metadata #
Next, let’s register the metadata features we moved to .obs
.
For this, we create one feature record for each column in the .obs
dataframe:
features = ln.Feature.from_df(adata.obs)
ln.save(features)
We use the Experimental Factor Ontology through Bionty to create a “FACS” label for the dataset:
lb.ExperimentalFactor.bionty().search("FACS").head(2) # search the public ontology
ontology_id | definition | synonyms | parents | molecule | instrument | measurement | __ratio__ | |
---|---|---|---|---|---|---|---|---|
name | ||||||||
fluorescence-activated cell sorting | EFO:0009108 | A Flow Cytometry Assay That Provides A Method ... | FACS|FAC sorting | [] | None | None | None | 100.0 |
BALB/c | EFO:0000602 | Balb/C Is A Mouse Strain Of Albion Mice. | BALB/cJ|C|BALBc | [] | None | None | None | 90.0 |
# import the record from the public ontology and save it to the registry
lb.ExperimentalFactor.from_bionty(ontology_id="EFO:0009108").save()
# show the content of the registry
lb.ExperimentalFactor.filter().df()
name | ontology_id | abbr | synonyms | description | molecule | instrument | measurement | bionty_source_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|---|---|
id | |||||||||||
lh5Cxy8w | fluorescence-activated cell sorting | EFO:0009108 | None | FACS|FAC sorting | A Flow Cytometry Assay That Provides A Method ... | None | None | None | CfKG | 2023-09-26 10:02:31 | DzTjkKse |
Register: data & annotate with metadata #
modalities = ln.Modality.lookup()
features = ln.Feature.lookup()
efs = lb.ExperimentalFactor.lookup()
species = lb.Species.lookup()
file = ln.File.from_anndata(
adata, description="Alpert19", field=lb.CellMarker.name, modality=modalities.protein
)
... storing '$PnE' as categorical
... storing '$PnR' as categorical
file.save()
Annotate by linking FACS & species labels:
file.labels.add(efs.fluorescence_activated_cell_sorting, features.assay)
file.labels.add(species.human, features.species)
Inspect the registered file#
Inspect features on a high level:
file.features
Features:
var: FeatureSet(id='KeJZwaJint39ughMvyEI', n=35, type='number', registry='bionty.CellMarker', hash='ldY9_GmptHLCcT7Nrpgo', updated_at=2023-09-26 10:02:32, modality_id='8Imbc507', created_by_id='DzTjkKse')
'CD33', 'CD11c', 'Ccr6', 'DNA1', 'CXCR3', 'CD11B', 'CD8', 'HLADR', 'Cd19', 'CD3', ...
obs: FeatureSet(id='mypZ9Yrersb96R5mS3Oj', n=5, registry='core.Feature', hash='wuGplW4XuoMvHi22B0Ir', updated_at=2023-09-26 10:02:32, modality_id='iUJ9JARf', created_by_id='DzTjkKse')
Bead (number)
Time (number)
Dead (number)
Cell_length (number)
(Ba138)Dd (number)
external: FeatureSet(id='odDTC6KjllzbqtYg1LnT', n=2, registry='core.Feature', hash='UZN25UlFEndb14m2m-P4', updated_at=2023-09-26 10:02:32, modality_id='iUJ9JARf', created_by_id='DzTjkKse')
🔗 assay (1, bionty.ExperimentalFactor): 'fluorescence-activated cell sorting'
🔗 species (1, bionty.Species): 'human'
Inspect low-level features in .var
:
file.features["var"].df().head()
name | synonyms | gene_symbol | ncbi_gene_id | uniprotkb_id | species_id | bionty_source_id | updated_at | created_by_id | |
---|---|---|---|---|---|---|---|---|---|
id | |||||||||
c3dZKHFOdllB | CD33 | CD33 | 945 | P20138 | uHJU | CgXV | 2023-09-26 10:02:27 | DzTjkKse | |
L0WKZ3fufq0J | CD11c | ITGAX | 3687 | P20702 | uHJU | CgXV | 2023-09-26 10:02:27 | DzTjkKse | |
fpPkjlGv15C9 | Ccr6 | CCR6 | 1235 | P51684 | uHJU | CgXV | 2023-09-26 10:02:27 | DzTjkKse | |
YA5Ezh6SAy10 | DNA1 | None | None | None | uHJU | CgXV | 2023-09-26 10:02:27 | DzTjkKse | |
agQD0dEzuoNA | CXCR3 | CXCR3 | 2833 | P49682 | uHJU | CgXV | 2023-09-26 10:02:27 | DzTjkKse |
Use auto-complete for marker names:
markers = file.features["var"].lookup()
import scanpy as sc
sc.pp.pca(adata)
sc.pl.pca(adata, color=markers.cd14.name)