Reference DeepSeek-V3

Positron-specific reference implementation of DeepSeek-V3. The original README is in README_DEEPSEEK.md.

The main goal of this repository is to execute the DeepSeek-V3 model while instrumenting it to save intermediates, so that we can validate the Positron implementation.

Step 1: Convert the weights

The HuggingFace weights are in a format that this repo's code can't directly load. Ergo, needs must convert:

# On a machine with some GPUs
cd inference
python3 convert.py \
  --hf-ckpt-path /opt/positron/weights/huggingface/deepseek-ai/DeepSeek-V3/ \
  --save-path /opt/positron/weights/huggingface/positron-ai/DeepSeek-V3-pt-converted-4-layers/ \
  --n-experts 256 \
  --n-layers 4 \
  --model-parallel 1

I believe DeepSeek's original purpose for the conversion script was to create sharded weight sets to be able to run the model distributed with torchrun, but as long as we're willing to limit the number of layers we don't need to do that.

Step 2: Generate

# On a machine with some GPUs
cd inference
python3 generate.py \
  --ckpt-path /opt/positron/weights/huggingface/positron-ai/DeepSeek-V3-pt-converted-4-layers/ \
  --config configs/config_671B_4_layers.json \
  --input-file requirements.txt \
  --max-new-tokens 10

The input file is just something I happened to have handy. Looks like the driver script will interpret each line of that file as a separate prompt and try to complete all of them.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github		.github
figures		figures
inference		inference
.gitignore		.gitignore
CITATION.cff		CITATION.cff
DeepSeek_V3.pdf		DeepSeek_V3.pdf
LICENSE-CODE		LICENSE-CODE
LICENSE-MODEL		LICENSE-MODEL
README.md		README.md
README_DEEPSEEK.md		README_DEEPSEEK.md
README_WEIGHTS.md		README_WEIGHTS.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Reference DeepSeek-V3

Step 1: Convert the weights

Step 2: Generate

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Reference DeepSeek-V3

Step 1: Convert the weights

Step 2: Generate

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages