heartmula
Set up and run HeartMuLa, the open-source music generation model family (Suno-like). Generates full songs from lyrics + tags with multilingual support.
Set up and run HeartMuLa, the open-source music generation model family (Suno-like). Generates full songs from lyrics + tags with multilingual support.
Real data. Real impact.
Emerging
Developers
Per week
Excellent
Skills give you superpowers. Install in 30 seconds.
HeartMuLa is a family of open-source music foundation models (Apache-2.0) that generates music conditioned on lyrics and tags. Comparable to Suno for open-source. Includes:
--lazy_load true (loads/unloads models sequentially)--mula_device cuda:0 --codec_device cuda:1 to split across GPUscd ~/ # or desired directory git clone https://github.com/HeartMuLa/heartlib.git cd heartlib
uv venv --python 3.10 .venv . .venv/bin/activate uv pip install -e .
IMPORTANT: As of Feb 2026, the pinned dependencies have conflicts with newer packages. Apply these fixes:
# Upgrade datasets (old version incompatible with current pyarrow) uv pip install --upgrade datasets # Upgrade transformers (needed for huggingface-hub 1.x compatibility) uv pip install --upgrade transformers
Patch 1 - RoPE cache fix in
src/heartlib/heartmula/modeling_heartmula.py:
In the
setup_caches method of the HeartMuLa class, add RoPE reinitialization after the reset_caches try/except block and before the with device: block:
# Re-initialize RoPE caches that were skipped during meta-device loading from torchtune.models.llama3_1._position_embeddings import Llama3ScaledRoPE for module in self.modules(): if isinstance(module, Llama3ScaledRoPE) and not module.is_cache_built: module.rope_init() module.to(device)
Why:
from_pretrained creates model on meta device first; Llama3ScaledRoPE.rope_init() skips cache building on meta tensors, then never rebuilds after weights are loaded to real device.
Patch 2 - HeartCodec loading fix in
src/heartlib/pipelines/music_generation.py:
Add
ignore_mismatched_sizes=True to ALL HeartCodec.from_pretrained() calls (there are 2: the eager load in __init__ and the lazy load in the codec property).
Why: VQ codebook
initted buffers have shape [1] in checkpoint vs [] in model. Same data, just scalar vs 0-d tensor. Safe to ignore.
cd heartlib # project root hf download --local-dir './ckpt' 'HeartMuLa/HeartMuLaGen' hf download --local-dir './ckpt/HeartMuLa-oss-3B' 'HeartMuLa/HeartMuLa-oss-3B-happy-new-year' hf download --local-dir './ckpt/HeartCodec-oss' 'HeartMuLa/HeartCodec-oss-20260123'
All 3 can be downloaded in parallel. Total size is several GB.
HeartMuLa uses CUDA by default (
--mula_device cuda --codec_device cuda). No extra setup needed if the user has an NVIDIA GPU with PyTorch CUDA support installed.
torch==2.4.1 includes CUDA 12.1 support out of the boxtorchtune may report version 0.4.0+cpu — this is just package metadata, it still uses CUDA via PyTorch--mula_device cpu --codec_device cpu, but expect generation to be extremely slow (potentially 30-60+ minutes for a single song vs ~4 minutes on GPU). CPU mode also requires significant RAM (~12GB+ free). If the user has no NVIDIA GPU, recommend using a cloud GPU service (Google Colab free tier with T4, Lambda Labs, etc.) or the online demo at https://heartmula.github.io/ instead.cd heartlib . .venv/bin/activate python ./examples/run_music_generation.py \ --model_path=./ckpt \ --version="3B" \ --lyrics="./assets/lyrics.txt" \ --tags="./assets/tags.txt" \ --save_path="./assets/output.mp3" \ --lazy_load true
Tags (comma-separated, no spaces):
piano,happy,wedding,synthesizer,romantic
or
rock,energetic,guitar,drums,male-vocal
Lyrics (use bracketed structural tags):
[Intro] [Verse] Your lyrics here... [Chorus] Chorus lyrics... [Bridge] Bridge lyrics... [Outro]
| Parameter | Default | Description |
|---|---|---|
| 240000 | Max length in ms (240s = 4 min) |
| 50 | Top-k sampling |
| 1.0 | Sampling temperature |
| 1.5 | Classifier-free guidance scale |
| false | Load/unload models on demand (saves VRAM) |
| bfloat16 | Dtype for HeartMuLa (bf16 recommended) |
| float32 | Dtype for HeartCodec (fp32 recommended for quality) |
MIT
mkdir -p ~/.hermes/skills/media/heartmula && curl -o ~/.hermes/skills/media/heartmula/SKILL.md https://raw.githubusercontent.com/NousResearch/hermes-agent/main/skills/media/heartmula/SKILL.md1,500+ AI skills, agents & workflows. Install in 30 seconds. Part of the Torly.ai family.
© 2026 Torly.ai. All rights reserved.