Essential C++ Knowledge

December 12, 2023

C++

I have not code C++ for exactly 10 years after graduation from the Department of CS. Now I need to pick it up. Here are very short records help me to pick it up.

Work on BAM and Sequence with R

April 06, 2023

R
BAM
String

BAM is large, which normally works only with low-level languages like C/C++. Since I mostly only use R, here are some collections of my code to read/modify BAM file.

Use Cron job in Linux

March 20, 2023

Linux
DevOps

A new trick, set up automatic cron job for my linux tasks

Work with SQLite with R or Shell

March 19, 2023

SQLite
R
Shell

In my work I sometimes want to create a light-weight database, and compared with databases like PostgreSQL or MongoBD, SQLite is really a good choice. Here I record some key code I use to interactive with SQLite in R or Shell.

ChAMP 3.0: illumina HumanMethylation EPIC V2

February 17, 2023

ChAMP

I am updating the ChAMP with the new illumina Human Methylation Array, V2. Here are some record

Popular Regression Models I should know

January 10, 2023

Regression

When learning the Cousera regression course, I want to have a collection of regression that are commonly used here.

Coursera - Regression Models

December 31, 2022

Coursera
Regression
R

Regression is am important tool that I should have dug into years ago. Here are just some key notes I write done during my learning with the Coursera course Regression Model.

Regex snippets in R

December 02, 2022

R
Regex

My R regex collections, will be expensively used in daily work.

Complex ggplot2: boxplot

September 27, 2022

ggplot2

My collection of code for ggplot2 boxplot

Install MonogDB on Mac M1 CPU

September 21, 2022

MongoDB

I never thought it is so hard to install a mongoDB on Mac. It took me over one hour to figure out. Here are some steps and records I can follow in the future.

Complex ggplot2: ScatterPlot

September 02, 2022

ggplot2
R

Here are some nice plot I draw with ggplot2, for copy and paste in the future

Apply H20.ai for Quick AutoML Task

August 16, 2022

Machine Learning
AutoML

In many times I need to use Machine Learning to quickly create a model for classification. AutoML seems to be a really good way to do it. Here I want to record a bit my code snippets of running H2o.ai, then in the future I can apply this code quickly for prototyping.

Calculate VCF similarity with somalier

August 15, 2022

VCF

My task is a calculate similarity between VCF files, somalier successfully done this task. Here are some code I may use in the future.

Liftover VCF from hg37 to hg38 with Picard

August 14, 2022

VCF
Liftover

In my recent task, I need to compare a set of VCF generated from hg37, with another set of VCF generated from hg38. Thus, I need to liftover those hg37-version VCF to hg38. The solution I found is using Picard.

Understand VCF File Format

August 11, 2022

VCF
Mutation
Sarek

it is a shame that I never look into vcf file format closely, merely because I have been working on methylation. Finally, now it is a chance for me to get some familarity with mutation pipeline and vcf files.

R Plotly Quick Code

July 24, 2022

R
Plotly
Web
Visualisation

A couple of years I did not use Plotly anymore, which is still such a good tool for interactive figure. Here are code I used to draw Plotly figures.

Complex ggplot2: Line Plot

July 04, 2022

R
ggplot2

Here are some nice plot I draw with ggplot2, for copy and paste in the future

Complex Heatmaps with R

July 01, 2022

R
Heatmap
Visualisation

My code records on how to generated nice and complex heatmap.

My R Snippets

July 01, 2022

R

Pieces of R code I may need to copy-paste in future work.

PCA Projection on Website with R script run at backend

June 11, 2022

R
React
Full-Stack

It is so common to draw PCA plot in Bioinformatic world. And it is cool to allow a PCA model to be able to do projection on new data onto the origin plot, thus new users will know how their new data is going to be clusteredw with old PCA plot. I managed to create a single front to backend webpage to do it online.

JavaScript Snippets: Promise for multiple query

June 04, 2022

JavaScript
Backend
MongoDB

In my one of the backend project, I need to query multiple tables and gather the result after all the searchs are done. Thus, normal promise will not work.

Nanpore Analysis: Bonito for BaseCalling

June 02, 2022

Nanopore

After using Guppy, I was recommended to use Bonito for base calling, it is said that Bonito has pretty high accuracy when compared with old method. More important, it can directly use remora for methylation analysis. Here is a record of how I use bonito for base calling

Parallel in R

May 04, 2022

R
paprallel

Here is a simple record of how to use multiple ways to do R parallel running.

ChAMP 3.0: champ.Overlap()

April 12, 2022

ChAMP
annotation

A new function added to ChAMP for general mapping between two segments on genome. Like mapping CpGs, Peaks .etc to Genes, CpGIslands .etc

Complex ggplot2: Barplot

April 07, 2022

visualisation
ggplot2

ggplot2 is nice, but can be quite complex...Here are some nice ggplot2 barplots I have drawn.

Nanopore Analysis: Handle Fast5 File

April 05, 2022

nanopore
alignment

Recently I am working on Nanopore data, firstly I am trying to set up the pipeline for 5hmC/5mC and SNP calling. In this post, I record my first exploration of fast5 data, which is the default output format for Nanpore machines.

Nanopore Analysis: Guppy for BaseCalling

April 05, 2022

nanopore

Trying to use Guppy, an official Nanopore provided software for BaseCalling, which means convert "pore signals" into fastq ATCG information. But actually, Guyppy can also do alignment, barcoding, trimming works .etc

Access to hg38 knownGene Genome Annotation

March 28, 2022

annotation

In my past post, I compared 5 different version of genome annotation, and seems knownGene is the best to use. It contains most transcripts, and update gradually with GENCODE. Here I record a bit how to fetch, organise, use knownGene.

Which Human Genome Annotation should I use?

March 24, 2022

annotation

Genome annotation is vital and fundermantal for whole Bioinformatic analysis. However, unexpectively, I found that the difference, relationships between genome annotation is not easy to be found. Here is my record of looking into Genome Annotation.

Integration Docker with Nextflow

March 12, 2022

nextflow
docker

After learning the minium knowledge about docker and nextflow, now I want to join them two. To me, the hardest part for both docker and nextflow are the file system, so it requires perfect match and file/folder between containers and hostmachine.

Essential Docker Knowledge

March 10, 2022

docker
nextflow

I am working on a nextflow pipeline recently, which contains a couple of process in the pipeline. We want to use docker to run each process, so here I am writing my coding report of how to create a docker for process, run it, and collect results .etc

Essential Nextflow knowledge

March 09, 2022

nextflow

I am now learning nextflow. I feel that there are too many parameters, syntax that I will actually not needed from tutorial. Thus, here I just record key patterns I will be used in my work.

My ChIP-Seq Preprocess Pipeline

February 28, 2022

ChIP-Seq
Pipeline

I am now organising code for ChIP-seq paper, so I record a bit the preprocessing steps for future usage.

Learn Nextflow: Part 1

February 10, 2022

nextflow

Recently I need to code a nextflow for RRBS analysis, so I learned nextflow a bit. Here is my record, it contains some most important and essential knowledge and understanding from me for this pipeline tool.

Use CollectHsMetrics (Picard) on GemBS Bam file

February 09, 2022

Debug
BAM
GemBS

I spend two days to make it work, running CollectHsMetrics on GemBS created BAM. The bug is created by different ways of MD5 generation. Here is a record how to make it work.

ChAMP 3.0: champ.PrepareManifest()

February 06, 2022

ChAMP

A new function added to ChAMP family, champ.PrepareManifest() will read into raw illumina manifest (or custom ones) to create ChAMP readable probe-2-CpG mapping information.

Learning Machine Learning-2: Model Selection

February 06, 2022

machine learning

For a long time I did not systemically study machine learning, which was named as pattern reconginsation when I was in University. Now I am reading the 《机器学习》, written by a famous Chinese machine learning expert. This is the second chapter of this book, a quick introduction of Model Selection.

Nextflow: nf-core sarek

January 27, 2022

nextflow
WGS

For many years I want to learn nextflow, never got a chance, now it is time. My task it to run nf-core sarek on PGP-UK WGS/WES data. Also it is my first time try to call SNP information out of WGS data.

Download SRA Sequence

January 17, 2022

SRA

A record of some command can be used for SRA download.

Modify Image exif Information

December 31, 2021

R

In the last day of 2021, I made a website for my lovely doggy Mountain. However, when I am organising his two-year ranges photos, I noticed all images from my camera time are all wrong. So I tried to solve it a bit, re-annotated image timestamp.

Install Hackintosh Mojave on S200 Machine

December 16, 2021

Hackintish

This is my record of how to install Hackintish Mojave on my S200 machine. This version of Mac OS is already a bit outdated, but I am not sure if my machine can install newer version, here is just a record and link I can check later.

Generate Single Cell Reference for EpiSCORE

December 07, 2021

cell-type-deconvolution
methylation

This is my trace back on how to reproduce EpiSCORE single cell matrix, I will use this note to create tissue cell type fraction matrix in the future.

Commands needed to install R on Ubuntu

December 06, 2021

R

It is always hard to install R in linux, in compile way... Here are some quick code I recorded to install R on Ubuntu system.

Import GemBS output into methylKit

November 28, 2021

RRBS
methylKit

Previously I have preprocessed RRBS data with GemBS, now I want to continue the downstream analysis. The tool I am using is methylKit, so I find a way to import the GemBS output result into methyKit.

Analysis RRBS with GemBS

November 18, 2021

RRBS
Pipeline

One of my recent project is to run RRBS analysis, with GemBS. These are some code I recorded during my analyais.

My R Data.Table Quick Code

October 19, 2021

R

Finally I started to use Data.Table, it is really fast and cool. However, learning to use data.table is a bit similar to learn some key functions like aggregate .etc. Here I record a bit my key code.

Send Email in R with mailR

October 05, 2021

R

In many cases we may want to send email with R. For example, running a super long program, we want to have a remind email after finishing. Or like me, want to create website with R backend that automatically do some work. Based on my test, mailR is one solution.

Learn Bayesian Inference 1: Understand Bayesian Probability

September 06, 2021

Bayesian

For a long time, I want to learn Bayesian Inference, finally it is a start here. First lesson is better understand Bayesian Probability

Bias in ChIP-seq visualisation

August 24, 2021

ChIP-seq
IGV
MAnorm2

Recently I am working on some ChIP-seq data. However, after normalisation, I noticed some of my result is not in accord with the IGV visualisation. I find a way to do adjustment for normalised plot, and it works well.

MAnorm2 package for hMeDIP-seq 2: Normalisation & Analysis

August 05, 2021

MAnorm2
MeDIP-seq

After preparation work with MAnorm2_utiles, I go the read counts and occupancy matrix, then I can try read it into MAnorm2. Based on my test, it gave me the best normalisation result, the more easy-to-plot distribution.

MAnorm2 package for hMeDIP-seq 1: Preparation

August 03, 2021

MAnorm2
MeDIP-seq
Preprocess

After working on QSEA and DiffBind, now I want to try the last solution - MAnorm2 for my hMeDIP-seq data. Out of my expectation, it is a bit hard indeed to run it, and this is the first half of my record, which is about preparation of with MAnorm2_utils.

QSEA package for hMeDIP-seq preprocess

August 02, 2021

MeDIP-seq
Preprocess

I am working on some hMeDIP-seq data. Previously I have used MACS2 + DiffBind pipeline, but eventually I got some un-expected results. I think the reason could be filtering of peaks during the preprocessing stage. So I found QSEA R package, who reads BAM files for analysis.

GO bubble plot with David and ggplot2

May 25, 2021

R
GO
ggplot2

For a long time I want to find a way to address the GO issue in my daily work, however eventually I think David + ggplot2 is not a bad option, it is easy to use, easy to understand, and David is famouse enough for most analysis.

Approches to quickly get gene or promoter coordinates

May 03, 2021

R
Annotation

It is such a common requirement in Bioinformatic that we need to get gene coordinates, or promoter region location across whole genome. Here I want to record ways to do it.

Create Desktop Apps with React NodeGUI

April 24, 2021

React
NodeGUI

Recently I want to create a cross-platform Apps, since I only know React, so I am trying to find some possible solutions. Like Electron and React NodeGUI. Today is my first try for React NodeGUI, there are indeed some tricky part behind the start code.

Read count and initial quality check plots on BAM files

March 14, 2021

R
MeDIP-seq
BAM
PCA

I want to quickly check the quality of a set of BAM files, generated from bowtie2. However, I did not see many tools for it, eventually I found multiBamSummary is one solution out. Here I record a bit how I get read count matrix from a set of BAM files, then plot a quick plotly plot for visualisation.

Deeper look of multiple dimension array() and apply() in R

March 06, 2021

R

Recently my wife asked a question that how to understand 3-dimension array in R. And How to understand the apply() function when it is applied on multi-dimension array. After digging further, I think it is really interesing...

Develop Gatsby on Kindle Browser

February 19, 2021

Kindle
Gatsby

Previously I have tested that Gatsby works for Kindle. Now the problem is how can I develop it? I do not want to deploy my code on Github page everytime. Finally I found Gatsby Serve works for Kindle.

Kindle Browser access Gatsby and React Github page

February 10, 2021

Github
Gatsby
React

Previously I was working on a cool simple project Feed-Ink, which is a tool to help me to read RSS on my Kindle. However, after nearly all development, I found I can not use Kindle to access the Github deploy static page. Here I want to test a bit.

Using MethylCIBERSORT for Cell Type Deconvolution

January 18, 2021

cell-type-deconvolution
methylation

Cell type deconvolution is an important angel to analysis for DNA methylation or RNA-seq analysis. Compared with previous refbase/reffree method, recently there is a new package called MethylCIBERSORT, which also can do this, and quite easy to use.

Compare Transcript Factor Peaks between ENCODE and TFregulomeR

January 17, 2021

TFBS
TFEA

Previously I found a very good database TFregulomeR. However, before I move one, I want check the quality of TFregulomeR by comparing the peaks to ENCODE.

Use TFregulomeR to get Transcript Factor Binding Peaks

January 16, 2021

TFBS
TFEA

Recently I am trying to do a set of Transcript Factor Enrichment Analysis for my 5hmC (hMeDIP-seq). I used to use ENCODE to do this, but eventually I found ENCODE data has very few TFs for my cell line (Mouse Intestine). So I started seeking other potential database and tools, and I found a lot indeed. Among them TFregulomeR is a new but quite powerful tool.

A ChAMP function to generate various Gene Features

January 03, 2021

ChAMP
GeneFeature
Annotation

Recently, when I am working on the latest Mouse Methylation Array, I found that the Manifest does not have gene annotation, like promoter, TSS200, Exon .etc, so here I wrote a function to generate all these gene features from UCSC refgene.

Generate ChAMP Annotation from illumina manifests

January 01, 2021

ChAMP
Annotation

I am coding the thrird version of ChAMP, here is the first task I am encounter now. I need to convert the illumina CSV to ChAMP Annotation, previously I only use some random code to achive this, but this time I decided to form them into some script for future usage.

Colourful output in R in terminal

December 30, 2020

R

I am recently improving ChAMP, I hope to improve the message output for ChAMP a bit. More specifically, I hope the print to be better formated, and with 2-3 coloures styles to indicates important message or code snippers. Finally I found some ways to do this.

Connect S3 with AWS CLI

December 13, 2020

aws
S3

Firs time doing things related to AWS. I never use anything related to AWS because I do not want to pay any money to it. Thanks for my friends, now I have a chance to get to know this famous cloud service provider.

Copy R objects into Clipboard

November 17, 2020

R

This is something I want to do for a long time, to find a way to copy objects from R to clipboard. It is a much better way to export small amount of data in/out R session

MeDIP-seq Analysis 2: Peak Analysis

November 17, 2020

Medip-seq
DiffBind
Peaks

Following the MeDIP-seq analysis, after preprocessing, now I want to get peaks for both each pheno group and their differential comparison. The software I am using here is MACS2 and DiffBind, here I just record a bit some code.

Solve Bugs: C++14 standard requested but CXX14 is not defined

November 10, 2020

R

I updated my Bioconductor to 3.12, then reinstalled all my pacakge (so sad...). However, ChAMP pacakge reported error that sparseMatrixStats failed to install because if this error.

MeDIP-seq Analysis 1: Preprocessing

November 09, 2020

Medip-seq
Preprocess

Recently I am working on a Medip-seq data, which contains 4 phenotypes, and it's my first time working on this type of data. So I decided to record a bit this pipeline.

R function combn to create 1v1 pairs from vector

November 04, 2020

R

A very useful R function to generate pairs from a list of options, suitable for automatically pair-wise comparision work.

R pacakge ganttrify for Gantt Chart

September 17, 2020

R

Someimtes I need to draw a Gantt Chart, but I found so many online tools are so expensive, so I looked for R tools, and the ganttrify works super well.

RRBS Analysis 3: Differential Methylated Probes

September 16, 2020

RRBS
methylKit

Continuing my RRBS analysis, now the key point is to get Differential Methylated Probes out. I encountered a serious issue about P-value distribution, it shows bimodal pattern, which is not a good sign in most case, so I tried to solved it in this post.

RRBS Analysis 2: MethylKit

September 14, 2020

RRBS
methylkit

Continuing my RRBS analysis, now what I need to do is work with the Bismark mapped bam file, transfer them into readable CpG information. The package I am using here is MethylKit.

RRBS Analysis 1: Preprocess

September 13, 2020

RRBS
Preprocess

Recently, I got a quick task to analysis RRBS data. Since I had no experience on that kind of data format. Here I record a bit my analysis steps. This is the first post related to this data analysis work, it merely focused on preprocess work from bcl file to final BAM file. Note that this data is generated by NuGEN company, thus some unique scripts is used here.

Update Bioconductor package via git

September 13, 2020

Github
ChAMP

Since I am maintaining ChAMP, constantly I need to update some programs, add features and similar staff. I have been using new git-based Bioconductor system for a while, here I am just record some codes for check.

UCL Citrix Client Install on Linux

September 08, 2020

linux

Since I can not link my Linux computer on printers in UCL, I have to try to use Citrix to finish printing work. However, it is a little bit tricky to install Citrix Client on Linux. Here I just record what I have done in past ten minutes, in case one day I have to search back.

R Pacakge corrplot for Correlation Plotting

September 08, 2020

R Package
Correlation

A nice R pacakge to draw correlation plot, I should organize a good script for long time easy use.

WGBS Preprocessing with GemBS

September 07, 2020

WGBS
Epigenetic
Methylation
GemBS

In this year (2020), I now have enountered MeDIP-seq, RRBS before, now I need to work on a WGBS data. It is my first time to do this type of data, thus I just want a quick and easy solution for it.

Install R and reinstall Packages

August 19, 2020

R

It is absolute a nightmare to install R, and R package. Nor to say in this world there are some packages as horrible as my ChAMP... Every year basically I need to install a newer version R, and reinstall packages...

My Shell Commands

July 27, 2020

Shell

This is just a simply post to record some of my commonly used bash script. So that I can copy paste quickly.

Github only Upload Certain Types of Files

July 20, 2020

Github

A short note about how to set up .gitignore

Fast Delete Large Files in Linux

July 17, 2020

Shell

A way to quickly delete files in Linux. Faster than rm, it could be used when I have a lot of files to delete.

Set up UCL VPN for Linux System

July 17, 2020

Linux
Shell

It seems UCL only provided detailed guild on how to set up VPN for Mac and Windows, instead of Linux. So here I record a bit the script I am using to do it

My ggplot2 Plots Code

July 12, 2020

R
ggplot2
visualisation

This is a note to record my quick code to draw comparaibily nice figure with ggplot2.

Bash Script to Generate Note Template

July 09, 2020

Shell
blog

My new personal website is working super well. However, everything I want to create a new note/post, I need to manually create a folder, then an markdown file, then copy-paste head text from other post. Then re-change the name. So I write a simple bash script here to automatically create folder file, text for my initial new notes.

My Samtools Command

July 08, 2020

samtools

This is a note for my regular-used Samtools command.

Regularly Backup MongoDB with R script

July 07, 2020

R
rds
Github

I am maitaining the GCGR website. So I think it's a good idea to constantly backup the database a bit. So my idea is to regularlly run some R code (because I only good at R), and scp/push dumped file to RDS and github separately.

My Command for PostgreSQL

July 06, 2020

postgreSQL

Recently one of my project is using postgreSQL, here I just record a bit common command I used to manipulate postgreSQL.

Gitpod for Website Update

July 01, 2020

Github
gitpod

After creating my gatsby website, I need to constantly write notes for it. The normal way I do it, is open the project locally, run gatsby develop .etc. Then after creating new note, deveploy then push. Here I tried a way to modify code online, more quicker and anywhere.

React Density Plot for Methylation Array

June 30, 2020

React
methylation array

In one of my colaboration project, I need to sometimes show density plot to all members in the team. I think a good way is to deploy the figure online. But most density are just pure R figures, which is hard to identify. So I quickly created a react app, which shows density plot online.

My Github Commands

June 29, 2020

Github

For many years, I merely only use commands like git add, git commit, git push .etc. Now I am colaborating with more and more professional people on Github. So I want to record a bit my commands learned here. It's not systemic, but maybe a quick cheatsheet.

Download from EGA via Python

June 28, 2020

EGA

Since I am working on a project related to a EGA data, I need to download them from the website. After learning a bit, I successfully down it. Here I recoded a bit the process.

Backup and Restore MongoDB

June 28, 2020

MongoDB

Since I am developing GCGR website, constantly I will need to modify the databse. So I may need to transfer the data from online to offline constantly. Here I want to record a bit the commands I use to dump data and recover them.

Deploy Gatsby to Github User Page

June 28, 2020

gatsby
Github

After sometime of development, I now finished a basic version of note. So then I want to deploy it on my github, I like the idea that using XXXX.github.io for domain name, which looks even better than using a bought domain. So I need to find out how to do it

Gatsby Get Current Url Pathname

June 28, 2020

gatsby

I want to make layout title show different font-size based on pages. For example, show a larger font in note list page, but a small button in blog page.

Gatsby-Personal-Website

June 27, 2020

gatsby
2020

Can't remember how many times I re-create my personal website. This time I used Gatsby React, a nice framework to do it.

Copyright © Yuan Tian 2023.