--- license: apache-2.0 language: - ps - en library_name: transformers pipeline_tag: text-classification tags: - pashto - afghanistan - zamai - conversational-ai - instruction-tuning datasets: - tasal9/ZamAI_Pashto_Dataset metrics: - perplexity - bleu widget: - text: سلام دې وي! تاسو څنګه یاست؟ example_title: Pashto Greeting - text: د افغانستان د تاریخ په اړه راته ووایه example_title: Afghanistan History - text: Hello, how can I help you today? example_title: English Greeting --- # ZamAI-Sentiment-Pashto ## Model Description Sentiment analysis model for Pashto text classification and emotion detection. This model is part of the ZamAI (زمای) project - an advanced Afghan AI assistant designed to understand and communicate in Pashto, English, and other Afghan languages. ## Key Features - Multi-class sentiment classification - Emotion detection - Cultural context understanding - Social media text analysis - Real-time sentiment scoring ## Use Cases - Social media monitoring - Customer feedback analysis - Content moderation - Market research - Opinion mining ## Model Architecture - **Base Model:** cardiffnlp/twitter-roberta-base-sentiment-latest - **Architecture:** roberta - **Task:** sentiment-analysis - **Languages:** Pashto (ps), English (en) ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load model and tokenizer model_name = "tasal9/zamai-sentiment-pashto" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForCausalLM.from_pretrained( model_name, torch_dtype=torch.float16, device_map="auto" ) # Generate text prompt = "سلام! زه د افغانستان په اړه پوښتنه لرم:" inputs = tokenizer.encode(prompt, return_tensors="pt") with torch.no_grad(): outputs = model.generate( inputs, max_length=200, temperature=0.8, do_sample=True, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## Training Details - **Dataset:** ZamAI Pashto Dataset (tasal9/ZamAI_Pashto_Dataset) - **Training Method:** Classification fine-tuning - **Epochs:** 6 - **Batch Size:** 8 - **Learning Rate:** 2e-5 ## Performance The model has been trained on conversational Pashto data and shows strong performance in: - Natural conversation flow - Cultural context understanding - Mixed language handling (Code-switching) - Afghan cultural knowledge ## Limitations - Primary focus on Pashto and English - May require further fine-tuning for specific domains - Performance may vary with complex technical terminology ## Ethical Considerations This model is designed to respect Afghan and Islamic values, promoting positive and constructive conversations while avoiding harmful or inappropriate content. ## Citation ```bibtex @misc{zamai_zamai_sentiment_pashto_2024, title={ZamAI ZamAI-Sentiment-Pashto: Advanced Pashto Language Model}, author={ZamAI Team}, year={2024}, publisher={Hugging Face}, url={https://huggingface.co/tasal9/zamai-sentiment-pashto} } ``` ## Contact For questions, suggestions, or collaboration opportunities, please reach out through the ZamAI project. --- *Built with ❤️ for the Afghan community*