Essential C++ Knowledge
December 12, 2023
C++
I have not code C++ for exactly 10 years after graduation from the Department of CS. Now I need to pick it up. Here are very short records help me to pick it up.
Work on BAM and Sequence with R
April 06, 2023
R
BAM
String
BAM is large, which normally works only with low-level languages like C/C++. Since I mostly only use R, here are some collections of my code to read/modify BAM file.
Use Cron job in Linux
March 20, 2023
Linux
DevOps
A new trick, set up automatic cron job for my linux tasks
Work with SQLite with R or Shell
March 19, 2023
SQLite
R
Shell
In my work I sometimes want to create a light-weight database, and compared with databases like PostgreSQL or MongoBD, SQLite is really a good choice. Here I record some key code I use to interactive with SQLite in R or Shell.
ChAMP 3.0: illumina HumanMethylation EPIC V2
February 17, 2023
ChAMP
I am updating the ChAMP with the new illumina Human Methylation Array, V2. Here are some record
Popular Regression Models I should know
January 10, 2023
Regression
When learning the Cousera regression course, I want to have a collection of regression that are commonly used here.
Coursera - Regression Models
December 31, 2022
Coursera
Regression
R
Regression is am important tool that I should have dug into years ago. Here are just some key notes I write done during my learning with the Coursera course Regression Model.
Regex snippets in R
December 02, 2022
R
Regex
My R regex collections, will be expensively used in daily work.
Complex ggplot2: boxplot
September 27, 2022
ggplot2
My collection of code for ggplot2 boxplot
Install MonogDB on Mac M1 CPU
September 21, 2022
MongoDB
I never thought it is so hard to install a mongoDB on Mac. It took me over one hour to figure out. Here are some steps and records I can follow in the future.
Complex ggplot2: ScatterPlot
September 02, 2022
ggplot2
R
Here are some nice plot I draw with ggplot2, for copy and paste in the future
Apply H20.ai for Quick AutoML Task
August 16, 2022
Machine Learning
AutoML
In many times I need to use Machine Learning to quickly create a model for classification. AutoML seems to be a really good way to do it. Here I want to record a bit my code snippets of running H2o.ai, then in the future I can apply this code quickly for prototyping.
Calculate VCF similarity with somalier
August 15, 2022
VCF
My task is a calculate similarity between VCF files, somalier successfully done this task. Here are some code I may use in the future.
Liftover VCF from hg37 to hg38 with Picard
August 14, 2022
VCF
Liftover
In my recent task, I need to compare a set of VCF generated from hg37, with another set of VCF generated from hg38. Thus, I need to liftover those hg37-version VCF to hg38. The solution I found is using Picard.
Understand VCF File Format
August 11, 2022
VCF
Mutation
Sarek
it is a shame that I never look into vcf file format closely, merely because I have been working on methylation. Finally, now it is a chance for me to get some familarity with mutation pipeline and vcf files.
R Plotly Quick Code
July 24, 2022
R
Plotly
Web
Visualisation
A couple of years I did not use Plotly anymore, which is still such a good tool for interactive figure. Here are code I used to draw Plotly figures.
Complex ggplot2: Line Plot
July 04, 2022
R
ggplot2
Here are some nice plot I draw with ggplot2, for copy and paste in the future
Complex Heatmaps with R
July 01, 2022
R
Heatmap
Visualisation
My code records on how to generated nice and complex heatmap.
My R Snippets
July 01, 2022
R
Pieces of R code I may need to copy-paste in future work.
PCA Projection on Website with R script run at backend
June 11, 2022
R
React
Full-Stack
It is so common to draw PCA plot in Bioinformatic world. And it is cool to allow a PCA model to be able to do projection on new data onto the origin plot, thus new users will know how their new data is going to be clusteredw with old PCA plot. I managed to create a single front to backend webpage to do it online.
JavaScript Snippets: Promise for multiple query
June 04, 2022
JavaScript
Backend
MongoDB
In my one of the backend project, I need to query multiple tables and gather the result after all the searchs are done. Thus, normal promise will not work.
Nanpore Analysis: Bonito for BaseCalling
June 02, 2022
Nanopore
After using Guppy, I was recommended to use Bonito for base calling, it is said that Bonito has pretty high accuracy when compared with old method. More important, it can directly use remora for methylation analysis. Here is a record of how I use bonito for base calling
Parallel in R
May 04, 2022
R
paprallel
Here is a simple record of how to use multiple ways to do R parallel running.
ChAMP 3.0: champ.Overlap()
April 12, 2022
ChAMP
annotation
A new function added to ChAMP for general mapping between two segments on genome. Like mapping CpGs, Peaks .etc to Genes, CpGIslands .etc
Complex ggplot2: Barplot
April 07, 2022
visualisation
ggplot2
ggplot2 is nice, but can be quite complex...Here are some nice ggplot2 barplots I have drawn.
Nanopore Analysis: Handle Fast5 File
April 05, 2022
nanopore
alignment
Recently I am working on Nanopore data, firstly I am trying to set up the pipeline for 5hmC/5mC and SNP calling. In this post, I record my first exploration of fast5 data, which is the default output format for Nanpore machines.
Nanopore Analysis: Guppy for BaseCalling
April 05, 2022
nanopore
Trying to use Guppy, an official Nanopore provided software for BaseCalling, which means convert "pore signals" into fastq ATCG information. But actually, Guyppy can also do alignment, barcoding, trimming works .etc
Access to hg38 knownGene Genome Annotation
March 28, 2022
annotation
In my past post, I compared 5 different version of genome annotation, and seems knownGene is the best to use. It contains most transcripts, and update gradually with GENCODE. Here I record a bit how to fetch, organise, use knownGene.
Which Human Genome Annotation should I use?
March 24, 2022
annotation
Genome annotation is vital and fundermantal for whole Bioinformatic analysis. However, unexpectively, I found that the difference, relationships between genome annotation is not easy to be found. Here is my record of looking into Genome Annotation.
Integration Docker with Nextflow
March 12, 2022
nextflow
docker
After learning the minium knowledge about docker and nextflow, now I want to join them two. To me, the hardest part for both docker and nextflow are the file system, so it requires perfect match and file/folder between containers and hostmachine.
Essential Docker Knowledge
March 10, 2022
docker
nextflow
I am working on a nextflow pipeline recently, which contains a couple of process in the pipeline. We want to use docker to run each process, so here I am writing my coding report of how to create a docker for process, run it, and collect results .etc
Essential Nextflow knowledge
March 09, 2022
nextflow
I am now learning nextflow. I feel that there are too many parameters, syntax that I will actually not needed from tutorial. Thus, here I just record key patterns I will be used in my work.
My ChIP-Seq Preprocess Pipeline
February 28, 2022
ChIP-Seq
Pipeline
I am now organising code for ChIP-seq paper, so I record a bit the preprocessing steps for future usage.
Learn Nextflow: Part 1
February 10, 2022
nextflow
Recently I need to code a nextflow for RRBS analysis, so I learned nextflow a bit. Here is my record, it contains some most important and essential knowledge and understanding from me for this pipeline tool.
Use CollectHsMetrics (Picard) on GemBS Bam file
February 09, 2022
Debug
BAM
GemBS
I spend two days to make it work, running CollectHsMetrics on GemBS created BAM. The bug is created by different ways of MD5 generation. Here is a record how to make it work.
ChAMP 3.0: champ.PrepareManifest()
February 06, 2022
ChAMP
A new function added to ChAMP family, champ.PrepareManifest() will read into raw illumina manifest (or custom ones) to create ChAMP readable probe-2-CpG mapping information.
Learning Machine Learning-2: Model Selection
February 06, 2022
machine learning
For a long time I did not systemically study machine learning, which was named as pattern reconginsation when I was in University. Now I am reading the 《机器学习》, written by a famous Chinese machine learning expert. This is the second chapter of this book, a quick introduction of Model Selection.
Nextflow: nf-core sarek
January 27, 2022
nextflow
WGS
For many years I want to learn nextflow, never got a chance, now it is time. My task it to run nf-core sarek on PGP-UK WGS/WES data. Also it is my first time try to call SNP information out of WGS data.
Download SRA Sequence
January 17, 2022
SRA
A record of some command can be used for SRA download.
Modify Image exif Information
December 31, 2021
R
In the last day of 2021, I made a website for my lovely doggy Mountain. However, when I am organising his two-year ranges photos, I noticed all images from my camera time are all wrong. So I tried to solve it a bit, re-annotated image timestamp.
Install Hackintosh Mojave on S200 Machine
December 16, 2021
Hackintish
This is my record of how to install Hackintish Mojave on my S200 machine. This version of Mac OS is already a bit outdated, but I am not sure if my machine can install newer version, here is just a record and link I can check later.
Generate Single Cell Reference for EpiSCORE
December 07, 2021
cell-type-deconvolution
methylation
This is my trace back on how to reproduce EpiSCORE single cell matrix, I will use this note to create tissue cell type fraction matrix in the future.
Commands needed to install R on Ubuntu
December 06, 2021
R
It is always hard to install R in linux, in compile way... Here are some quick code I recorded to install R on Ubuntu system.
Import GemBS output into methylKit
November 28, 2021
RRBS
methylKit
Previously I have preprocessed RRBS data with GemBS, now I want to continue the downstream analysis. The tool I am using is methylKit, so I find a way to import the GemBS output result into methyKit.
Analysis RRBS with GemBS
November 18, 2021
RRBS
Pipeline
One of my recent project is to run RRBS analysis, with GemBS. These are some code I recorded during my analyais.
My R Data.Table Quick Code
October 19, 2021
R
Finally I started to use Data.Table, it is really fast and cool. However, learning to use data.table is a bit similar to learn some key functions like aggregate .etc. Here I record a bit my key code.
Send Email in R with mailR
October 05, 2021
R
In many cases we may want to send email with R. For example, running a super long program, we want to have a remind email after finishing. Or like me, want to create website with R backend that automatically do some work. Based on my test, mailR is one solution.
Learn Bayesian Inference 1: Understand Bayesian Probability
September 06, 2021
Bayesian
For a long time, I want to learn Bayesian Inference, finally it is a start here. First lesson is better understand Bayesian Probability
Bias in ChIP-seq visualisation
August 24, 2021
ChIP-seq
IGV
MAnorm2
Recently I am working on some ChIP-seq data. However, after normalisation, I noticed some of my result is not in accord with the IGV visualisation. I find a way to do adjustment for normalised plot, and it works well.
MAnorm2 package for hMeDIP-seq 2: Normalisation & Analysis
August 05, 2021
MAnorm2
MeDIP-seq
After preparation work with MAnorm2_utiles, I go the read counts and occupancy matrix, then I can try read it into MAnorm2. Based on my test, it gave me the best normalisation result, the more easy-to-plot distribution.
MAnorm2 package for hMeDIP-seq 1: Preparation
August 03, 2021
MAnorm2
MeDIP-seq
Preprocess
After working on QSEA and DiffBind, now I want to try the last solution - MAnorm2 for my hMeDIP-seq data. Out of my expectation, it is a bit hard indeed to run it, and this is the first half of my record, which is about preparation of with MAnorm2_utils.
QSEA package for hMeDIP-seq preprocess
August 02, 2021
MeDIP-seq
Preprocess
I am working on some hMeDIP-seq data. Previously I have used MACS2 + DiffBind pipeline, but eventually I got some un-expected results. I think the reason could be filtering of peaks during the preprocessing stage. So I found QSEA R package, who reads BAM files for analysis.
GO bubble plot with David and ggplot2
May 25, 2021
R
GO
ggplot2
For a long time I want to find a way to address the GO issue in my daily work, however eventually I think David + ggplot2 is not a bad option, it is easy to use, easy to understand, and David is famouse enough for most analysis.
Approches to quickly get gene or promoter coordinates
May 03, 2021
R
Annotation
It is such a common requirement in Bioinformatic that we need to get gene coordinates, or promoter region location across whole genome. Here I want to record ways to do it.
Create Desktop Apps with React NodeGUI
April 24, 2021
React
NodeGUI
Recently I want to create a cross-platform Apps, since I only know React, so I am trying to find some possible solutions. Like Electron and React NodeGUI. Today is my first try for React NodeGUI, there are indeed some tricky part behind the start code.
Read count and initial quality check plots on BAM files
March 14, 2021
R
MeDIP-seq
BAM
PCA
I want to quickly check the quality of a set of BAM files, generated from bowtie2. However, I did not see many tools for it, eventually I found multiBamSummary is one solution out. Here I record a bit how I get read count matrix from a set of BAM files, then plot a quick plotly plot for visualisation.
Deeper look of multiple dimension array() and apply() in R
March 06, 2021
R
Recently my wife asked a question that how to understand 3-dimension array in R. And How to understand the apply() function when it is applied on multi-dimension array. After digging further, I think it is really interesing...
Develop Gatsby on Kindle Browser
February 19, 2021
Kindle
Gatsby
Previously I have tested that Gatsby works for Kindle. Now the problem is how can I develop it? I do not want to deploy my code on Github page everytime. Finally I found Gatsby Serve works for Kindle.
Kindle Browser access Gatsby and React Github page
February 10, 2021
Github
Gatsby
React
Previously I was working on a cool simple project Feed-Ink, which is a tool to help me to read RSS on my Kindle. However, after nearly all development, I found I can not use Kindle to access the Github deploy static page. Here I want to test a bit.
Using MethylCIBERSORT for Cell Type Deconvolution
January 18, 2021
cell-type-deconvolution
methylation
Cell type deconvolution is an important angel to analysis for DNA methylation or RNA-seq analysis. Compared with previous refbase/reffree method, recently there is a new package called MethylCIBERSORT, which also can do this, and quite easy to use.
Compare Transcript Factor Peaks between ENCODE and TFregulomeR
January 17, 2021
TFBS
TFEA
Previously I found a very good database TFregulomeR. However, before I move one, I want check the quality of TFregulomeR by comparing the peaks to ENCODE.
Use TFregulomeR to get Transcript Factor Binding Peaks
January 16, 2021
TFBS
TFEA
Recently I am trying to do a set of Transcript Factor Enrichment Analysis for my 5hmC (hMeDIP-seq). I used to use ENCODE to do this, but eventually I found ENCODE data has very few TFs for my cell line (Mouse Intestine). So I started seeking other potential database and tools, and I found a lot indeed. Among them TFregulomeR is a new but quite powerful tool.
A ChAMP function to generate various Gene Features
January 03, 2021
ChAMP
GeneFeature
Annotation
Recently, when I am working on the latest Mouse Methylation Array, I found that the Manifest does not have gene annotation, like promoter, TSS200, Exon .etc, so here I wrote a function to generate all these gene features from UCSC refgene.
Generate ChAMP Annotation from illumina manifests
January 01, 2021
ChAMP
Annotation
I am coding the thrird version of ChAMP, here is the first task I am encounter now. I need to convert the illumina CSV to ChAMP Annotation, previously I only use some random code to achive this, but this time I decided to form them into some script for future usage.
Colourful output in R in terminal
December 30, 2020
R
I am recently improving ChAMP, I hope to improve the message output for ChAMP a bit. More specifically, I hope the print to be better formated, and with 2-3 coloures styles to indicates important message or code snippers. Finally I found some ways to do this.
Connect S3 with AWS CLI
December 13, 2020
aws
S3
Firs time doing things related to AWS. I never use anything related to AWS because I do not want to pay any money to it. Thanks for my friends, now I have a chance to get to know this famous cloud service provider.
Copy R objects into Clipboard
November 17, 2020
R
This is something I want to do for a long time, to find a way to copy objects from R to clipboard. It is a much better way to export small amount of data in/out R session
MeDIP-seq Analysis 2: Peak Analysis
November 17, 2020
Medip-seq
DiffBind
Peaks
Following the MeDIP-seq analysis, after preprocessing, now I want to get peaks for both each pheno group and their differential comparison. The software I am using here is MACS2 and DiffBind, here I just record a bit some code.
Solve Bugs: C++14 standard requested but CXX14 is not defined
November 10, 2020
R
I updated my Bioconductor to 3.12, then reinstalled all my pacakge (so sad...). However, ChAMP pacakge reported error that sparseMatrixStats failed to install because if this error.
MeDIP-seq Analysis 1: Preprocessing
November 09, 2020
Medip-seq
Preprocess
Recently I am working on a Medip-seq data, which contains 4 phenotypes, and it's my first time working on this type of data. So I decided to record a bit this pipeline.
R function combn to create 1v1 pairs from vector
November 04, 2020
R
A very useful R function to generate pairs from a list of options, suitable for automatically pair-wise comparision work.
R pacakge ganttrify for Gantt Chart
September 17, 2020
R
Someimtes I need to draw a Gantt Chart, but I found so many online tools are so expensive, so I looked for R tools, and the ganttrify works super well.
RRBS Analysis 3: Differential Methylated Probes
September 16, 2020
RRBS
methylKit
Continuing my RRBS analysis, now the key point is to get Differential Methylated Probes out. I encountered a serious issue about P-value distribution, it shows bimodal pattern, which is not a good sign in most case, so I tried to solved it in this post.
RRBS Analysis 2: MethylKit
September 14, 2020
RRBS
methylkit
Continuing my RRBS analysis, now what I need to do is work with the Bismark mapped bam file, transfer them into readable CpG information. The package I am using here is MethylKit.
RRBS Analysis 1: Preprocess
September 13, 2020
RRBS
Preprocess
Recently, I got a quick task to analysis RRBS data. Since I had no experience on that kind of data format. Here I record a bit my analysis steps. This is the first post related to this data analysis work, it merely focused on preprocess work from bcl file to final BAM file. Note that this data is generated by NuGEN company, thus some unique scripts is used here.
Update Bioconductor package via git
September 13, 2020
Github
ChAMP
Since I am maintaining ChAMP, constantly I need to update some programs, add features and similar staff. I have been using new git-based Bioconductor system for a while, here I am just record some codes for check.
UCL Citrix Client Install on Linux
September 08, 2020
linux
Since I can not link my Linux computer on printers in UCL, I have to try to use Citrix to finish printing work. However, it is a little bit tricky to install Citrix Client on Linux. Here I just record what I have done in past ten minutes, in case one day I have to search back.
R Pacakge corrplot for Correlation Plotting
September 08, 2020
R Package
Correlation
A nice R pacakge to draw correlation plot, I should organize a good script for long time easy use.
WGBS Preprocessing with GemBS
September 07, 2020
WGBS
Epigenetic
Methylation
GemBS
In this year (2020), I now have enountered MeDIP-seq, RRBS before, now I need to work on a WGBS data. It is my first time to do this type of data, thus I just want a quick and easy solution for it.
Install R and reinstall Packages
August 19, 2020
R
It is absolute a nightmare to install R, and R package. Nor to say in this world there are some packages as horrible as my ChAMP... Every year basically I need to install a newer version R, and reinstall packages...
My Shell Commands
July 27, 2020
Shell
This is just a simply post to record some of my commonly used bash script. So that I can copy paste quickly.
Github only Upload Certain Types of Files
July 20, 2020
Github
A short note about how to set up .gitignore
Fast Delete Large Files in Linux
July 17, 2020
Shell
A way to quickly delete files in Linux. Faster than rm, it could be used when I have a lot of files to delete.
Set up UCL VPN for Linux System
July 17, 2020
Linux
Shell
It seems UCL only provided detailed guild on how to set up VPN for Mac and Windows, instead of Linux. So here I record a bit the script I am using to do it
My ggplot2 Plots Code
July 12, 2020
R
ggplot2
visualisation
This is a note to record my quick code to draw comparaibily nice figure with ggplot2.
Bash Script to Generate Note Template
July 09, 2020
Shell
blog
My new personal website is working super well. However, everything I want to create a new note/post, I need to manually create a folder, then an markdown file, then copy-paste head text from other post. Then re-change the name. So I write a simple bash script here to automatically create folder file, text for my initial new notes.
My Samtools Command
July 08, 2020
samtools
This is a note for my regular-used Samtools command.
Regularly Backup MongoDB with R script
July 07, 2020
R
rds
Github
I am maitaining the GCGR website. So I think it's a good idea to constantly backup the database a bit. So my idea is to regularlly run some R code (because I only good at R), and scp/push dumped file to RDS and github separately.
My Command for PostgreSQL
July 06, 2020
postgreSQL
Recently one of my project is using postgreSQL, here I just record a bit common command I used to manipulate postgreSQL.
Gitpod for Website Update
July 01, 2020
Github
gitpod
After creating my gatsby website, I need to constantly write notes for it. The normal way I do it, is open the project locally, run gatsby develop .etc. Then after creating new note, deveploy then push. Here I tried a way to modify code online, more quicker and anywhere.
React Density Plot for Methylation Array
June 30, 2020
React
methylation array
In one of my colaboration project, I need to sometimes show density plot to all members in the team. I think a good way is to deploy the figure online. But most density are just pure R figures, which is hard to identify. So I quickly created a react app, which shows density plot online.
My Github Commands
June 29, 2020
Github
For many years, I merely only use commands like git add, git commit, git push .etc. Now I am colaborating with more and more professional people on Github. So I want to record a bit my commands learned here. It's not systemic, but maybe a quick cheatsheet.
Download from EGA via Python
June 28, 2020
EGA
Since I am working on a project related to a EGA data, I need to download them from the website. After learning a bit, I successfully down it. Here I recoded a bit the process.
Backup and Restore MongoDB
June 28, 2020
MongoDB
Since I am developing GCGR website, constantly I will need to modify the databse. So I may need to transfer the data from online to offline constantly. Here I want to record a bit the commands I use to dump data and recover them.
Deploy Gatsby to Github User Page
June 28, 2020
gatsby
Github
After sometime of development, I now finished a basic version of note. So then I want to deploy it on my github, I like the idea that using XXXX.github.io for domain name, which looks even better than using a bought domain. So I need to find out how to do it
Gatsby Get Current Url Pathname
June 28, 2020
gatsby
I want to make layout title show different font-size based on pages. For example, show a larger font in note list page, but a small button in blog page.
Gatsby-Personal-Website
June 27, 2020
gatsby
2020
Can't remember how many times I re-create my personal website. This time I used Gatsby React, a nice framework to do it.