Spaces:

alakxender
/

dhivehi-tts-demos

Running on Zero

alakxender commited on Jul 7

Commit

7e668b3

1 Parent(s): 6cde794

t

Files changed (1) hide show

dia_1_6B_dv.py CHANGED Viewed

@@ -370,21 +370,21 @@ def get_dia_1_6B_tab():
             )
         else:
             gr.Markdown("_(No examples configured or example prompt file missing)_")
-    gr.Markdown(
-        "---\n"
-        "**General Guidelines:**\n"
-    "- Keep input text length moderate\n"
-    "  - Short input (corresponding to under 5s of audio) will sound unnatural\n"
-    "  - Very long input (corresponding to over 20s of audio) will make the speech unnaturally fast\n\n"
-    "- Use non-verbal tags sparingly, from the list in the README. Overusing or using unlisted non-verbals may cause weird artifacts\n\n"
-    "- Always begin input text with [S1], and always alternate between [S1] and [S2] (i.e. [S1]... [S1]... is not good)\n\n"
-    "**When using audio prompts (voice cloning):**\n"
-    "- Provide the transcript of the to-be cloned audio before the generation text\n"
-    "- Transcript must use [S1], [S2] speaker tags correctly:\n"
-    "  - Single speaker: [S1]...\n"
-    "  - Two speakers: [S1]... [S2]...\n"
-    "- Duration of the to-be cloned audio should be 5~10 seconds for the best results\n"
-    "  - (Keep in mind: 1 second ≈ 86 tokens)\n"
-    "- Put [S1] or [S2] (the second-to-last speaker's tag) at the end of the audio to improve audio quality at the end"
-    )
     # No explicit return needed for context manager pattern

             )
         else:
             gr.Markdown("_(No examples configured or example prompt file missing)_")
+            gr.Markdown(
+                "---\n"
+                "**General Guidelines:**\n"
+            "- Keep input text length moderate\n"
+            "  - Short input (corresponding to under 5s of audio) will sound unnatural\n"
+            "  - Very long input (corresponding to over 20s of audio) will make the speech unnaturally fast\n\n"
+            "- Use non-verbal tags sparingly, from the list in the README. Overusing or using unlisted non-verbals may cause weird artifacts\n\n"
+            "- Always begin input text with [S1], and always alternate between [S1] and [S2] (i.e. [S1]... [S1]... is not good)\n\n"
+            "**When using audio prompts (voice cloning):**\n"
+            "- Provide the transcript of the to-be cloned audio before the generation text\n"
+            "- Transcript must use [S1], [S2] speaker tags correctly:\n"
+            "  - Single speaker: [S1]...\n"
+            "  - Two speakers: [S1]... [S2]...\n"
+            "- Duration of the to-be cloned audio should be 5~10 seconds for the best results\n"
+            "  - (Keep in mind: 1 second ≈ 86 tokens)\n"
+            "- Put [S1] or [S2] (the second-to-last speaker's tag) at the end of the audio to improve audio quality at the end"
+            )
     # No explicit return needed for context manager pattern