Update README.md

#2
by AntonV HF Staff - opened
Files changed (1) hide show
  1. README.md +49 -0
README.md CHANGED
@@ -84,6 +84,55 @@ sf.write("simple.mp3", output, 44100)
84
 
85
  A pypi package and a working CLI tool will be available soon.
86
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
87
  ## 💻 Hardware and Inference Speed
88
 
89
  Dia has been tested on only GPUs (pytorch 2.0+, CUDA 12.6). CPU support is to be added soon.
 
84
 
85
  A pypi package and a working CLI tool will be available soon.
86
 
87
+ ### As part of transformers
88
+
89
+ #### Generation with Text
90
+
91
+ ```python
92
+ from transformers import AutoProcessor, DiaForConditionalGeneration
93
+
94
+ torch_device = "cuda"
95
+ model_checkpoint = "buttercrab/dia-v1-1.6b"
96
+
97
+ text = ["[S1] Dia is an open weights text to dialogue model."]
98
+ processor = AutoProcessor.from_pretrained(model_checkpoint)
99
+ inputs = processor(text=text, padding=True, return_tensors="pt").to(torch_device)
100
+
101
+ model = DiaForConditionalGeneration.from_pretrained(model_checkpoint).to(torch_device)
102
+ outputs = model.generate(**inputs, max_new_tokens=256) # corresponds to around ~2s
103
+
104
+ # save audio to a file
105
+ outputs = processor.batch_decode(outputs)
106
+ processor.save_audio(outputs, "example.wav")
107
+ ```
108
+
109
+ #### Generation with Text and Audio (Voice Cloning)
110
+
111
+ ```python
112
+ from datasets import load_dataset, Audio
113
+ from transformers import AutoProcessor, DiaForConditionalGeneration
114
+
115
+ torch_device = "cuda"
116
+ model_checkpoint = "buttercrab/dia-v1-1.6b"
117
+
118
+ ds = load_dataset("hf-internal-testing/dailytalk-dummy", split="train")
119
+ ds = ds.cast_column("audio", Audio(sampling_rate=44100))
120
+ audio = ds[-1]["audio"]["array"]
121
+ # text is a transcript of the audio + additional text you want as new audio
122
+ text = ["[S1] I know. It's going to save me a lot of money, I hope. [S2] I sure hope so for you."]
123
+
124
+ processor = AutoProcessor.from_pretrained(model_checkpoint)
125
+ inputs = processor(text=text, audio=audio, padding=True, return_tensors="pt").to(torch_device)
126
+ prompt_len = processor.get_audio_prompt_len(inputs["decoder_attention_mask"])
127
+
128
+ model = DiaForConditionalGeneration.from_pretrained(model_checkpoint).to(torch_device)
129
+ outputs = model.generate(**inputs, max_new_tokens=256) # corresponds to around ~2s
130
+
131
+ # retrieve actually generated audio and save to a file
132
+ outputs = processor.batch_decode(outputs, audio_prompt_len=prompt_len)
133
+ processor.save_audio(outputs, "example_with_audio.wav")
134
+ ```
135
+
136
  ## 💻 Hardware and Inference Speed
137
 
138
  Dia has been tested on only GPUs (pytorch 2.0+, CUDA 12.6). CPU support is to be added soon.