Welcome to Liyuan’s Log

This is the place to share my learnings!

Retrieval System

Retrieval system from write path to read path

Date: August 29, 2025

LLM Agent

Let's talk about LLM Agent...

Date: June 26, 2025

Inference Time Compute

Use more inference time compute to get the better performance...

Date: June 11, 2025

Vision Language Model

A few papers about vision language model

Date: June 2, 2025

Reward Model

Reward models are widely used in ranking LLM responses and preference alignment...

Date: May 4, 2025

My Take on the Second Half of AI

Read the second half of AI multiple times and summarize something...

Date: April 23, 2025

Policy Gradient and PPO from Scratch

Let's implement Policy Gradient and PPO from scratch...

Date: April 19, 2025

Supervised Fine Tuning From Scratch

Let's implement SFT from scratch...

Date: April 9, 2025

Attention Backpropagation

How to backpropagate the attention...

Date: March 21, 2025

3 questions about QLoRA

QLoRA is a memory efficient and parameter efficient fine-tuning approach...

Date: February 17, 2025

8-bit Optimizer

Make the training memory usage much less...

Date: February 5, 2025

What I Learn from Training DeepSeek V2

Date: February 3, 2025

Beyond Training Context Length

Transformer-based Language models have a context length limitation during training. Let's denote this length as `L`. It is used to sample `L` tokens from the training dataset for each training example. As you can imagine, the larger the `L`, the more compute is needed for training and the slower the training speed. But in inference time, it would be great if the model's context length can be longer than the training context length...

Date: January 31, 2025

Flash Attention

Flash Attention is a faster way to compute attention operation...

Date: January 4, 2025

Liyuan's Log

Blog