Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
**Architecture & Training Configuration:**
|
| 2 |
+
|
| 3 |
+
- *Base Model Configuration*: This variant is built upon the Llama2-7B configuration, ensuring a robust foundation that aligns with the latest advancements in model architecture.
|
| 4 |
+
|
| 5 |
+
- *Sequence Length Adaptation*: Originally processed data for a sequence length of 2048 was detokenized and re-encoded to fit a sequence length of 4096. This step follows the preprocessing strategy of Megatron-LM, enhancing our model's capacity to understand and generate more complex sequences.
|
| 6 |
+
|
| 7 |
+
- *Batch Size & Token Management*: We adopted a batch size capable of managing 4 million tokens, tailored to accommodate the increased sequence length and ensure efficient data processing.
|
| 8 |
+
|
| 9 |
+
- *Integration of GQA Technologies*: To boost training efficiency, our configuration includes the integration of Gradient Quantization and Aggregation technologies. With 32 attention heads and a group size of 4, this feature significantly enhances the model's learning and processing capabilities.
|
| 10 |
+
|