Software
I am an advocate for open and transparent science and invested in sharing any analytical innovations so that they can benefit the wider community. My team has contributed to the development of a large number of software packages for extracting meaningful and translatable information from high-dimensional biological assays.
AFid
Autofluorescence is a long-standing problem that has hindered fluorescence microscopy image analysis. To address this, we have developed a method that identifies and can exclude autofluorescent signals from multi-channel images post acquisition.
Canete N, Baharlou H, Patrick E (2023). AFidR: A method to identify and exclude autofluorescent signals from multi-channel images post acquisition. R package version 0.0.1.
ClassifyR
The software formalises a framework for classification and survival model evaluation in R. There are four stages; Data transformation, feature selection, model training, and prediction. The requirements of variable types and variable order are fixed, but specialised variables for functions can also be provided. The framework is wrapped in a driver loop that reproducibly carries out a number of cross-validation schemes. Functions for differential mean, differential variability, and differential distribution are included. Additional functions may be developed by the user, by creating an interface to the framework.
Strbenac D, Mann GJ, Ormerod JT, Yang JYH (2015). “ClassifyR: an R package for performance assessment of classification with applications to transcriptomics.” Bioinformatics, 31(11), 1851-1853.
DCARS
DCARS is a flexible statistical approach which uses local weighted correlations to build a powerful and robust statistical test to identify significant variation in levels of concordance across a ranking of samples. This has the potential to discover biologically informative relationships between genes across a variable of interest, such as survival outcome.
Ghazanfar S (2023). DCARS: Differential Correlation across Ranked Samples. R package version 0.3.5.
directPA
Direction analysis is a set of tools designed to identify combinatorial effects of multiple treatments/conditions on pathways and kinases profiled by microarray, RNA-seq, proteomics, or phosphoproteomics data. See Yang P et al (2014) doi:10.1093/bioinformatics/btt616; and Yang P et al. (2016) doi:10.1002/pmic.201600068.
Patrick PY&E (2020). directPA: Direction Analysis for Pathways and Kinases. R package version 1.5.
FuseSOM
A correlation-based multiview self-organizing map for the characterization of cell types in highly multiplexed in situ imaging cytometry assays (FuseSOM
) is a tool for unsupervised clustering. FuseSOM
is robust and achieves high accuracy by combining a Self Organizing Map
architecture and a Multiview
integration of correlation based metrics. This allows FuseSOM to cluster highly multiplexed in situ imaging cytometry assays.
<0-length citation>
lisaClust
lisaClust provides a series of functions to identify and visualise regions of tissue where spatial associations between cell-types is similar. This package can be used to provide a high-level summary of cell-type colocalization in multiplexed imaging data that has been segmented at a single-cell resolution.
Patrick E, Canete N (2023). lisaClust: lisaClust: Clustering of Local Indicators of Spatial Association. R package version 1.9.1.
scHOT
Single cell Higher Order Testing (scHOT) is an R package that facilitates testing changes in higher order structure of gene expression along either a developmental trajectory or across space. scHOT is general and modular in nature, can be run in multiple data contexts such as along a continuous trajectory, between discrete groups, and over spatial orientations; as well as accommodate any higher order measurement such as variability or correlation. scHOT meaningfully adds to first order effect testing, such as differential expression, and provides a framework for interrogating higher order interactions from single cell data.
Ghazanfar S, Lin Y (2023). scHOT: single-cell higher order testing. doi:10.18129/B9.bioc.scHOT, R package version 1.12.0.
scFeatures
scFeatures constructs multi-view representations of single-cell and spatial data. scFeatures is a tool that generates multi-view representations of single-cell and spatial data through the construction of a total of 17 feature types. These features can then be used for a variety of analyses using other software in Biocondutor.
Cao,Y., Lin,Y., Patrick,E., Yang,P., Yang,J.Y.H. & (2022). “scFeatures: multi-view representations of single-cell and spatial data for disease outcome prediction.” Bioinformatics, 38(20), 4745-4753. ISSN 1367-4803, doi:10.1093/bioinformatics/btac590.
scMerge
Like all gene expression data, single-cell data suffers from batch effects and other unwanted variations that makes accurate biological interpretations difficult. The scMerge method leverages factor analysis, stably expressed genes (SEGs) and (pseudo-) replicates to remove unwanted variations and merge multiple single-cell data. This package contains all the necessary functions in the scMerge pipeline, including the identification of SEGs, replication-identification methods, and merging of single-cell data.
Lin Y, Ghazanfar S, Wang K, Gagnon-Bartsch J, Lo K, Su X, Han Z, Ormerod J, Speed T, Yang P, Yang J (2019). “scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets.” Proceedings of the National Academy of Sciences. doi:10.1073/pnas.1820006116.
simpleSeg
Image segmentation is the process of identifying the borders of individual objects (in this case cells) within an image. This allows for the features of cells such as marker expression and morphology to be extracted, stored and analysed. simpleSeg provides functionality for user friendly, watershed based segmentation on multiplexed cellular images in R based on the intensity of user specified protein marker channels. simpleSeg can also be used for the normalization of single cell data obtained from multiple images.
Canete N, Nicholls A, Patrick E (2023). simpleSeg: A package to perform simple cell segmentation. doi:10.18129/B9.bioc.simpleSeg, R package version 1.2.3.
spicyR
The spicyR package provides a framework for performing inference on changes in spatial relationships between pairs of cell types for cell-resolution spatial omics technologies. spicyR consists of three primary steps: (i) summarizing the degree of spatial localization between pairs of cell types for each image; (ii) modelling the variability in localization summary statistics as a function of cell counts and (iii) testing for changes in spatial localizations associated with a response variable.
Canete N, Iyengar S, Ormerod J, Baharlou H, Harman A, Patrick E (2022). “spicyR: spatial analysis of in situ cytometry data in R.” Bioinformatics, 38(11), 3099–3105. doi:10.1093/bioinformatics/btac268.
Statial
Statial is a suite of functions for identifying changes in cell state. The functionality provided by Statial provides robust quantification of cell type localisation which are invariant to changes in tissue structure. In addition to this Statial uncovers changes in marker expression associated with varying levels of localisation. These features can be used to explore how the structure and function of different cell types may be altered by the agents they are surrounded with.
Ameen F, Iyengar S, Ghazanfar S, Patrick E (2023). Statial: A package to identify changes in cell state relative to spatial associations. doi:10.18129/B9.bioc.Statial, R package version 1.2.2.
MoleculeExperiment
MoleculeExperiment contains functions to create and work with objects from the new MoleculeExperiment class. We introduce this class for analysing molecule-based spatial transcriptomics data (e.g., Xenium by 10X, Cosmx SMI by Nanostring, and Merscope by Vizgen). This allows researchers to analyse spatial transcriptomics data at the molecule level, and to have standardised data formats accross vendors.
Peters Couto B, Robertson N, Patrick E, Ghazanfar S (2023). MoleculeExperiment: Prioritising a molecule-level storage of Spatial Transcriptomics Data. doi:10.18129/B9.bioc.MoleculeExperiment, R package version 1.0.2.
treekoR
treekoR is a novel framework that aims to utilise the hierarchical nature of single cell cytometry data to find robust and interpretable associations between cell subsets and patient clinical end points. These associations are aimed to recapitulate the nested proportions prevalent in workflows inovlving manual gating, which are often overlooked in workflows using automatic clustering to identify cell populations. We developed treekoR to: Derive a hierarchical tree structure of cell clusters; quantify a cell types as a proportion relative to all cells in a sample (%total), and, as the proportion relative to a parent population (%parent); perform significance testing using the calculated proportions; and provide an interactive html visualisation to help highlight key results.
Chan A (2023). treekoR: Cytometry Cluster Hierarchy and Cellular-to-phenotype Associations. doi:10.18129/B9.bioc.treekoR, R package version 1.8.0.
TOP
TOP constructs a transferable model across gene expression platforms for prospective experiments. Such a transferable model can be trained to make predictions on independent validation data with an accuracy that is similar to a re-substituted model. The TOP procedure also has the flexibility to be adapted to suit the most common clinical response variables, including linear response, binomial and Cox PH models.
Robertson H, Robertson N (2023). TOP: TOP Constructs Transferable Model Across Gene Expression Platforms. doi:10.18129/B9.bioc.TOP, R package version 1.0.0.