Skip to the content.

Welcome to Liyuan’s Log

This is the place to share my learnings!

LinkedIn GitHub

Retrieval System

Retrieval system from write path to read path

LLM Agent

Let's talk about LLM Agent...

Inference Time Compute

Use more inference time compute to get the better performance...

Vision Language Model

A few papers about vision language model

Reward Model

Reward models are widely used in ranking LLM responses and preference alignment...

My Take on the Second Half of AI

Read the second half of AI multiple times and summarize something...

Policy Gradient and PPO from Scratch

Let's implement Policy Gradient and PPO from scratch...

Supervised Fine Tuning From Scratch

Let's implement SFT from scratch...

Attention Backpropagation

How to backpropagate the attention...

3 questions about QLoRA

QLoRA is a memory efficient and parameter efficient fine-tuning approach...

8-bit Optimizer

Make the training memory usage much less...

What I Learn from Training DeepSeek V2

Beyond Training Context Length

Transformer-based Language models have a context length limitation during training. Let's denote this length as `L`. It is used to sample `L` tokens from the training dataset for each training example. As you can imagine, the larger the `L`, the more compute is needed for training and the slower the training speed. But in inference time, it would be great if the model's context length can be longer than the training context length...

Flash Attention

Flash Attention is a faster way to compute attention operation...