Essential Neural Network Knowledge
August 05, 2025
AIDeepLearningI learned deep learning and neural network many years ago, but now I want to refresh it a bit, merely because after many years of development, there are many new and better materials, and also I want to rethink what I should apply deep learning (AI) to my work.
Essential C++ Knowledge
December 12, 2023
C++I have not code C++ for exactly 10 years after graduation from the Department of CS. Now I need to pick it up. Here are very short records help me to pick it up.
Work on BAM and Sequence with R
April 06, 2023
RBAMStringBAM is large, which normally works only with low-level languages like C/C++. Since I mostly only use R, here are some collections of my code to read/modify BAM file.
Use Cron job in Linux
March 20, 2023
LinuxDevOpsA new trick, set up automatic cron job for my linux tasks
Work with SQLite with R or Shell
March 19, 2023
SQLiteRShellIn my work I sometimes want to create a light-weight database, and compared with databases like PostgreSQL or MongoBD, SQLite is really a good choice. Here I record some key code I use to interactive with SQLite in R or Shell.
ChAMP 3.0: illumina HumanMethylation EPIC V2
February 17, 2023
ChAMPI am updating the ChAMP with the new illumina Human Methylation Array, V2. Here are some record
Popular Regression Models I should know
January 10, 2023
RegressionWhen learning the Cousera regression course, I want to have a collection of regression that are commonly used here.
Coursera - Regression Models
December 31, 2022
CourseraRegressionRRegression is am important tool that I should have dug into years ago. Here are just some key notes I write done during my learning with the Coursera course Regression Model.
Regex snippets in R
December 02, 2022
RRegexMy R regex collections, will be expensively used in daily work.
Complex ggplot2: boxplot
September 27, 2022
ggplot2My collection of code for ggplot2 boxplot
Install MonogDB on Mac M1 CPU
September 21, 2022
MongoDBI never thought it is so hard to install a mongoDB on Mac. It took me over one hour to figure out. Here are some steps and records I can follow in the future.
Complex ggplot2: ScatterPlot
September 02, 2022
ggplot2RHere are some nice plot I draw with ggplot2, for copy and paste in the future
Apply H20.ai for Quick AutoML Task
August 16, 2022
Machine LearningAutoMLIn many times I need to use Machine Learning to quickly create a model for classification. AutoML seems to be a really good way to do it. Here I want to record a bit my code snippets of running H2o.ai, then in the future I can apply this code quickly for prototyping.
Calculate VCF similarity with somalier
August 15, 2022
VCFMy task is a calculate similarity between VCF files, somalier successfully done this task. Here are some code I may use in the future.
Liftover VCF from hg37 to hg38 with Picard
August 14, 2022
VCFLiftoverIn my recent task, I need to compare a set of VCF generated from hg37, with another set of VCF generated from hg38. Thus, I need to liftover those hg37-version VCF to hg38. The solution I found is using Picard.
Understand VCF File Format
August 11, 2022
VCFMutationSarekit is a shame that I never look into vcf file format closely, merely because I have been working on methylation. Finally, now it is a chance for me to get some familarity with mutation pipeline and vcf files.
R Plotly Quick Code
July 24, 2022
RPlotlyWebVisualisationA couple of years I did not use Plotly anymore, which is still such a good tool for interactive figure. Here are code I used to draw Plotly figures.
Complex ggplot2: Line Plot
July 04, 2022
Rggplot2Here are some nice plot I draw with ggplot2, for copy and paste in the future
My R Snippets
July 01, 2022
RPieces of R code I may need to copy-paste in future work.
Complex Heatmaps with R
July 01, 2022
RHeatmapVisualisationMy code records on how to generated nice and complex heatmap.
PCA Projection on Website with R script run at backend
June 11, 2022
RReactFull-StackIt is so common to draw PCA plot in Bioinformatic world. And it is cool to allow a PCA model to be able to do projection on new data onto the origin plot, thus new users will know how their new data is going to be clusteredw with old PCA plot. I managed to create a single front to backend webpage to do it online.
JavaScript Snippets: Promise for multiple query
June 04, 2022
JavaScriptBackendMongoDBIn my one of the backend project, I need to query multiple tables and gather the result after all the searchs are done. Thus, normal promise will not work.
Nanpore Analysis: Bonito for BaseCalling
June 02, 2022
NanoporeAfter using Guppy, I was recommended to use Bonito for base calling, it is said that Bonito has pretty high accuracy when compared with old method. More important, it can directly use remora for methylation analysis. Here is a record of how I use bonito for base calling
Parallel in R
May 04, 2022
RpaprallelHere is a simple record of how to use multiple ways to do R parallel running.
ChAMP 3.0: champ.Overlap()
April 12, 2022
ChAMPannotationA new function added to ChAMP for general mapping between two segments on genome. Like mapping CpGs, Peaks .etc to Genes, CpGIslands .etc
Complex ggplot2: Barplot
April 07, 2022
visualisationggplot2ggplot2 is nice, but can be quite complex...Here are some nice ggplot2 barplots I have drawn.
Nanopore Analysis: Guppy for BaseCalling
April 05, 2022
nanoporeTrying to use Guppy, an official Nanopore provided software for BaseCalling, which means convert "pore signals" into fastq ATCG information. But actually, Guyppy can also do alignment, barcoding, trimming works .etc
Nanopore Analysis: Handle Fast5 File
April 05, 2022
nanoporealignmentRecently I am working on Nanopore data, firstly I am trying to set up the pipeline for 5hmC/5mC and SNP calling. In this post, I record my first exploration of fast5 data, which is the default output format for Nanpore machines.
Access to hg38 knownGene Genome Annotation
March 28, 2022
annotationIn my past post, I compared 5 different version of genome annotation, and seems knownGene is the best to use. It contains most transcripts, and update gradually with GENCODE. Here I record a bit how to fetch, organise, use knownGene.
Which Human Genome Annotation should I use?
March 24, 2022
annotationGenome annotation is vital and fundermantal for whole Bioinformatic analysis. However, unexpectively, I found that the difference, relationships between genome annotation is not easy to be found. Here is my record of looking into Genome Annotation.
Integration Docker with Nextflow
March 12, 2022
nextflowdockerAfter learning the minium knowledge about docker and nextflow, now I want to join them two. To me, the hardest part for both docker and nextflow are the file system, so it requires perfect match and file/folder between containers and hostmachine.
Essential Docker Knowledge
March 10, 2022
dockernextflowI am working on a nextflow pipeline recently, which contains a couple of process in the pipeline. We want to use docker to run each process, so here I am writing my coding report of how to create a docker for process, run it, and collect results .etc
Essential Nextflow knowledge
March 09, 2022
nextflowI am now learning nextflow. I feel that there are too many parameters, syntax that I will actually not needed from tutorial. Thus, here I just record key patterns I will be used in my work.
My ChIP-Seq Preprocess Pipeline
February 28, 2022
ChIP-SeqPipelineI am now organising code for ChIP-seq paper, so I record a bit the preprocessing steps for future usage.
Learn Nextflow: Part 1
February 10, 2022
nextflowRecently I need to code a nextflow for RRBS analysis, so I learned nextflow a bit. Here is my record, it contains some most important and essential knowledge and understanding from me for this pipeline tool.
Use CollectHsMetrics (Picard) on GemBS Bam file
February 09, 2022
DebugBAMGemBSI spend two days to make it work, running CollectHsMetrics on GemBS created BAM. The bug is created by different ways of MD5 generation. Here is a record how to make it work.
Learning Machine Learning-2: Model Selection
February 06, 2022
machine learningFor a long time I did not systemically study machine learning, which was named as pattern reconginsation when I was in University. Now I am reading the 《机器学习》, written by a famous Chinese machine learning expert. This is the second chapter of this book, a quick introduction of Model Selection.
ChAMP 3.0: champ.PrepareManifest()
February 06, 2022
ChAMPA new function added to ChAMP family, champ.PrepareManifest() will read into raw illumina manifest (or custom ones) to create ChAMP readable probe-2-CpG mapping information.
Nextflow: nf-core sarek
January 27, 2022
nextflowWGSFor many years I want to learn nextflow, never got a chance, now it is time. My task it to run nf-core sarek on PGP-UK WGS/WES data. Also it is my first time try to call SNP information out of WGS data.
Download SRA Sequence
January 17, 2022
SRAA record of some command can be used for SRA download.
Modify Image exif Information
December 31, 2021
RIn the last day of 2021, I made a website for my lovely doggy Mountain. However, when I am organising his two-year ranges photos, I noticed all images from my camera time are all wrong. So I tried to solve it a bit, re-annotated image timestamp.
Install Hackintosh Mojave on S200 Machine
December 16, 2021
HackintishThis is my record of how to install Hackintish Mojave on my S200 machine. This version of Mac OS is already a bit outdated, but I am not sure if my machine can install newer version, here is just a record and link I can check later.
Generate Single Cell Reference for EpiSCORE
December 07, 2021
cell-type-deconvolutionmethylationThis is my trace back on how to reproduce EpiSCORE single cell matrix, I will use this note to create tissue cell type fraction matrix in the future.
Commands needed to install R on Ubuntu
December 06, 2021
RIt is always hard to install R in linux, in compile way... Here are some quick code I recorded to install R on Ubuntu system.
Import GemBS output into methylKit
November 28, 2021
RRBSmethylKitPreviously I have preprocessed RRBS data with GemBS, now I want to continue the downstream analysis. The tool I am using is methylKit, so I find a way to import the GemBS output result into methyKit.
Analysis RRBS with GemBS
November 18, 2021
RRBSPipelineOne of my recent project is to run RRBS analysis, with GemBS. These are some code I recorded during my analyais.
My R Data.Table Quick Code
October 19, 2021
RFinally I started to use Data.Table, it is really fast and cool. However, learning to use data.table is a bit similar to learn some key functions like aggregate .etc. Here I record a bit my key code.
Send Email in R with mailR
October 05, 2021
RIn many cases we may want to send email with R. For example, running a super long program, we want to have a remind email after finishing. Or like me, want to create website with R backend that automatically do some work. Based on my test, mailR is one solution.
Learn Bayesian Inference 1: Understand Bayesian Probability
September 06, 2021
BayesianFor a long time, I want to learn Bayesian Inference, finally it is a start here. First lesson is better understand Bayesian Probability
Bias in ChIP-seq visualisation
August 24, 2021
ChIP-seqIGVMAnorm2Recently I am working on some ChIP-seq data. However, after normalisation, I noticed some of my result is not in accord with the IGV visualisation. I find a way to do adjustment for normalised plot, and it works well.
MAnorm2 package for hMeDIP-seq 2: Normalisation & Analysis
August 05, 2021
MAnorm2MeDIP-seqAfter preparation work with MAnorm2_utiles, I go the read counts and occupancy matrix, then I can try read it into MAnorm2. Based on my test, it gave me the best normalisation result, the more easy-to-plot distribution.
MAnorm2 package for hMeDIP-seq 1: Preparation
August 03, 2021
MAnorm2MeDIP-seqPreprocessAfter working on QSEA and DiffBind, now I want to try the last solution - MAnorm2 for my hMeDIP-seq data. Out of my expectation, it is a bit hard indeed to run it, and this is the first half of my record, which is about preparation of with MAnorm2_utils.
QSEA package for hMeDIP-seq preprocess
August 02, 2021
MeDIP-seqPreprocessI am working on some hMeDIP-seq data. Previously I have used MACS2 + DiffBind pipeline, but eventually I got some un-expected results. I think the reason could be filtering of peaks during the preprocessing stage. So I found QSEA R package, who reads BAM files for analysis.
GO bubble plot with David and ggplot2
May 25, 2021
RGOggplot2For a long time I want to find a way to address the GO issue in my daily work, however eventually I think David + ggplot2 is not a bad option, it is easy to use, easy to understand, and David is famouse enough for most analysis.
Approches to quickly get gene or promoter coordinates
May 03, 2021
RAnnotationIt is such a common requirement in Bioinformatic that we need to get gene coordinates, or promoter region location across whole genome. Here I want to record ways to do it.
Create Desktop Apps with React NodeGUI
April 24, 2021
ReactNodeGUIRecently I want to create a cross-platform Apps, since I only know React, so I am trying to find some possible solutions. Like Electron and React NodeGUI. Today is my first try for React NodeGUI, there are indeed some tricky part behind the start code.
Read count and initial quality check plots on BAM files
March 14, 2021
RMeDIP-seqBAMPCAI want to quickly check the quality of a set of BAM files, generated from bowtie2. However, I did not see many tools for it, eventually I found multiBamSummary is one solution out. Here I record a bit how I get read count matrix from a set of BAM files, then plot a quick plotly plot for visualisation.
Deeper look of multiple dimension array() and apply() in R
March 06, 2021
RRecently my wife asked a question that how to understand 3-dimension array in R. And How to understand the apply() function when it is applied on multi-dimension array. After digging further, I think it is really interesing...
Develop Gatsby on Kindle Browser
February 19, 2021
KindleGatsbyPreviously I have tested that Gatsby works for Kindle. Now the problem is how can I develop it? I do not want to deploy my code on Github page everytime. Finally I found Gatsby Serve works for Kindle.
Kindle Browser access Gatsby and React Github page
February 10, 2021
GithubGatsbyReactPreviously I was working on a cool simple project Feed-Ink, which is a tool to help me to read RSS on my Kindle. However, after nearly all development, I found I can not use Kindle to access the Github deploy static page. Here I want to test a bit.
Using MethylCIBERSORT for Cell Type Deconvolution
January 18, 2021
cell-type-deconvolutionmethylationCell type deconvolution is an important angel to analysis for DNA methylation or RNA-seq analysis. Compared with previous refbase/reffree method, recently there is a new package called MethylCIBERSORT, which also can do this, and quite easy to use.
Compare Transcript Factor Peaks between ENCODE and TFregulomeR
January 17, 2021
TFBSTFEAPreviously I found a very good database TFregulomeR. However, before I move one, I want check the quality of TFregulomeR by comparing the peaks to ENCODE.
Use TFregulomeR to get Transcript Factor Binding Peaks
January 16, 2021
TFBSTFEARecently I am trying to do a set of Transcript Factor Enrichment Analysis for my 5hmC (hMeDIP-seq). I used to use ENCODE to do this, but eventually I found ENCODE data has very few TFs for my cell line (Mouse Intestine). So I started seeking other potential database and tools, and I found a lot indeed. Among them TFregulomeR is a new but quite powerful tool.
A ChAMP function to generate various Gene Features
January 03, 2021
ChAMPGeneFeatureAnnotationRecently, when I am working on the latest Mouse Methylation Array, I found that the Manifest does not have gene annotation, like promoter, TSS200, Exon .etc, so here I wrote a function to generate all these gene features from UCSC refgene.
Generate ChAMP Annotation from illumina manifests
January 01, 2021
ChAMPAnnotationI am coding the thrird version of ChAMP, here is the first task I am encounter now. I need to convert the illumina CSV to ChAMP Annotation, previously I only use some random code to achive this, but this time I decided to form them into some script for future usage.
Colourful output in R in terminal
December 30, 2020
RI am recently improving ChAMP, I hope to improve the message output for ChAMP a bit. More specifically, I hope the print to be better formated, and with 2-3 coloures styles to indicates important message or code snippers. Finally I found some ways to do this.
Connect S3 with AWS CLI
December 13, 2020
awsS3Firs time doing things related to AWS. I never use anything related to AWS because I do not want to pay any money to it. Thanks for my friends, now I have a chance to get to know this famous cloud service provider.
Copy R objects into Clipboard
November 17, 2020
RThis is something I want to do for a long time, to find a way to copy objects from R to clipboard. It is a much better way to export small amount of data in/out R session
MeDIP-seq Analysis 2: Peak Analysis
November 17, 2020
Medip-seqDiffBindPeaksFollowing the MeDIP-seq analysis, after preprocessing, now I want to get peaks for both each pheno group and their differential comparison. The software I am using here is MACS2 and DiffBind, here I just record a bit some code.
Solve Bugs: C++14 standard requested but CXX14 is not defined
November 10, 2020
RI updated my Bioconductor to 3.12, then reinstalled all my pacakge (so sad...). However, ChAMP pacakge reported error that sparseMatrixStats failed to install because if this error.
MeDIP-seq Analysis 1: Preprocessing
November 09, 2020
Medip-seqPreprocessRecently I am working on a Medip-seq data, which contains 4 phenotypes, and it's my first time working on this type of data. So I decided to record a bit this pipeline.
R function combn to create 1v1 pairs from vector
November 04, 2020
RA very useful R function to generate pairs from a list of options, suitable for automatically pair-wise comparision work.
R pacakge ganttrify for Gantt Chart
September 17, 2020
RSomeimtes I need to draw a Gantt Chart, but I found so many online tools are so expensive, so I looked for R tools, and the ganttrify works super well.
RRBS Analysis 3: Differential Methylated Probes
September 16, 2020
RRBSmethylKitContinuing my RRBS analysis, now the key point is to get Differential Methylated Probes out. I encountered a serious issue about P-value distribution, it shows bimodal pattern, which is not a good sign in most case, so I tried to solved it in this post.
RRBS Analysis 2: MethylKit
September 14, 2020
RRBSmethylkitContinuing my RRBS analysis, now what I need to do is work with the Bismark mapped bam file, transfer them into readable CpG information. The package I am using here is MethylKit.
RRBS Analysis 1: Preprocess
September 13, 2020
RRBSPreprocessRecently, I got a quick task to analysis RRBS data. Since I had no experience on that kind of data format. Here I record a bit my analysis steps. This is the first post related to this data analysis work, it merely focused on preprocess work from bcl file to final BAM file. Note that this data is generated by NuGEN company, thus some unique scripts is used here.
Update Bioconductor package via git
September 13, 2020
GithubChAMPSince I am maintaining ChAMP, constantly I need to update some programs, add features and similar staff. I have been using new git-based Bioconductor system for a while, here I am just record some codes for check.
R Pacakge corrplot for Correlation Plotting
September 08, 2020
R PackageCorrelationA nice R pacakge to draw correlation plot, I should organize a good script for long time easy use.
UCL Citrix Client Install on Linux
September 08, 2020
linuxSince I can not link my Linux computer on printers in UCL, I have to try to use Citrix to finish printing work. However, it is a little bit tricky to install Citrix Client on Linux. Here I just record what I have done in past ten minutes, in case one day I have to search back.
WGBS Preprocessing with GemBS
September 07, 2020
WGBSEpigeneticMethylationGemBSIn this year (2020), I now have enountered MeDIP-seq, RRBS before, now I need to work on a WGBS data. It is my first time to do this type of data, thus I just want a quick and easy solution for it.
Install R and reinstall Packages
August 19, 2020
RIt is absolute a nightmare to install R, and R package. Nor to say in this world there are some packages as horrible as my ChAMP... Every year basically I need to install a newer version R, and reinstall packages...
My Shell Commands
July 27, 2020
ShellThis is just a simply post to record some of my commonly used bash script. So that I can copy paste quickly.
Github only Upload Certain Types of Files
July 20, 2020
GithubA short note about how to set up .gitignore
Set up UCL VPN for Linux System
July 17, 2020
LinuxShellIt seems UCL only provided detailed guild on how to set up VPN for Mac and Windows, instead of Linux. So here I record a bit the script I am using to do it
Fast Delete Large Files in Linux
July 17, 2020
ShellA way to quickly delete files in Linux. Faster than rm, it could be used when I have a lot of files to delete.
My ggplot2 Plots Code
July 12, 2020
Rggplot2visualisationThis is a note to record my quick code to draw comparaibily nice figure with ggplot2.
Bash Script to Generate Note Template
July 09, 2020
ShellblogMy new personal website is working super well. However, everything I want to create a new note/post, I need to manually create a folder, then an markdown file, then copy-paste head text from other post. Then re-change the name. So I write a simple bash script here to automatically create folder file, text for my initial new notes.
My Samtools Command
July 08, 2020
samtoolsThis is a note for my regular-used Samtools command.
Regularly Backup MongoDB with R script
July 07, 2020
RrdsGithubI am maitaining the GCGR website. So I think it's a good idea to constantly backup the database a bit. So my idea is to regularlly run some R code (because I only good at R), and scp/push dumped file to RDS and github separately.
My Command for PostgreSQL
July 06, 2020
postgreSQLRecently one of my project is using postgreSQL, here I just record a bit common command I used to manipulate postgreSQL.
Gitpod for Website Update
July 01, 2020
GithubgitpodAfter creating my gatsby website, I need to constantly write notes for it. The normal way I do it, is open the project locally, run gatsby develop .etc. Then after creating new note, deveploy then push. Here I tried a way to modify code online, more quicker and anywhere.
React Density Plot for Methylation Array
June 30, 2020
Reactmethylation arrayIn one of my colaboration project, I need to sometimes show density plot to all members in the team. I think a good way is to deploy the figure online. But most density are just pure R figures, which is hard to identify. So I quickly created a react app, which shows density plot online.
My Github Commands
June 29, 2020
GithubFor many years, I merely only use commands like git add, git commit, git push .etc. Now I am colaborating with more and more professional people on Github. So I want to record a bit my commands learned here. It's not systemic, but maybe a quick cheatsheet.
Gatsby Get Current Url Pathname
June 28, 2020
gatsbyI want to make layout title show different font-size based on pages. For example, show a larger font in note list page, but a small button in blog page.
Deploy Gatsby to Github User Page
June 28, 2020
gatsbyGithubAfter sometime of development, I now finished a basic version of note. So then I want to deploy it on my github, I like the idea that using XXXX.github.io for domain name, which looks even better than using a bought domain. So I need to find out how to do it
Backup and Restore MongoDB
June 28, 2020
MongoDBSince I am developing GCGR website, constantly I will need to modify the databse. So I may need to transfer the data from online to offline constantly. Here I want to record a bit the commands I use to dump data and recover them.
Download from EGA via Python
June 28, 2020
EGASince I am working on a project related to a EGA data, I need to download them from the website. After learning a bit, I successfully down it. Here I recoded a bit the process.
Gatsby-Personal-Website
June 27, 2020
gatsby2020Can't remember how many times I re-create my personal website. This time I used Gatsby React, a nice framework to do it.