Spaces:

namelessai
/

webdl

Runtime error

App Files Files Community

namelessai commited on 15 days ago

Commit

58e33be

verified ·

1 Parent(s): d8efa31

Create QUICKSTART.md

Browse files

Files changed (1) hide show

QUICKSTART.md +155 -0

QUICKSTART.md ADDED Viewed

	@@ -0,0 +1,155 @@

+# Quick Start Guide
+Get the HTTrack Website Scraper running in under 5 minutes!
+## 🚀 Fastest Way (Docker)
+```bash
+# 1. Make the deploy script executable
+chmod +x deploy.sh
+# 2. Deploy everything
+./deploy.sh deploy
+# 3. Open your browser
+# Go to: http://localhost:7860
+```
+That's it! 🎉
+## 📝 Manual Docker Setup
+If you prefer step-by-step:
+```bash
+# Build the image
+docker build -t httrack-scraper .
+# Run the container
+docker run -d -p 7860:7860 --name httrack-scraper httrack-scraper
+# Access at http://localhost:7860
+```
+## 🐍 Local Python Setup
+Without Docker:
+```bash
+# 1. Install HTTrack
+sudo apt-get install httrack  # Ubuntu/Debian
+brew install httrack          # macOS
+# 2. Install Python packages
+pip install -r requirements.txt
+# 3. Run the app
+python app.py
+# 4. Open http://localhost:7860
+```
+## 🎯 First Scrape
+1. Enter a URL: `https://example.com`
+2. Set Max Depth: `2` (good starting point)
+3. Click "Start Scraping"
+4. Watch the progress and logs
+5. Download your ZIP file when complete
+## ⚙️ Recommended Settings for First Time
+| Setting | Value | Why |
+|---------|-------|-----|
+| Max Depth | 2-3 | Fast, won't download too much |
+| Max Rate | 500000 | Respectful to servers |
+| Respect robots.txt | ✅ Checked | Ethical scraping |
+## 🛠️ Common Commands
+```bash
+# View logs
+docker logs -f httrack-scraper
+# Stop the app
+docker stop httrack-scraper
+# Restart the app
+docker restart httrack-scraper
+# Or use the deploy script
+./deploy.sh logs      # View logs
+./deploy.sh stop      # Stop app
+./deploy.sh restart   # Restart app
+./deploy.sh clean     # Remove everything
+```
+## ⚠️ Before You Start
+**Important Reminders:**
+- ✅ Only scrape websites you have permission to access
+- ✅ Respect robots.txt files
+- ✅ Be aware of copyright laws
+- ✅ Use reasonable rate limits
+- ✅ Check the website's Terms of Service
+**Good Use Cases:**
+- Backing up your own website
+- Archiving with permission
+- Research projects (with authorization)
+- Personal offline browsing (legally obtained content)
+## 🐛 Quick Troubleshooting
+### Port Already in Use
+```bash
+# Use a different port
+docker run -d -p 8080:7860 --name httrack-scraper httrack-scraper
+# Then visit http://localhost:8080
+```
+### HTTrack Not Found (Local Installation)
+```bash
+# Install it first
+sudo apt-get update
+sudo apt-get install httrack
+```
+### Container Won't Start
+```bash
+# Check the logs
+docker logs httrack-scraper
+# Try rebuilding
+docker stop httrack-scraper
+docker rm httrack-scraper
+docker build -t httrack-scraper .
+docker run -d -p 7860:7860 --name httrack-scraper httrack-scraper
+```
+## 📚 Next Steps
+- Read the full [README.md](README.md) for detailed documentation
+- Experiment with different depth settings
+- Try scraping progressively larger sites
+- Check out the logs to understand what's happening
+## 💡 Tips for Better Results
+1. **Start Small**: Test with depth 1-2 first
+2. **Increase Gradually**: Only go deeper if needed
+3. **Monitor Logs**: Watch for errors or issues
+4. **Be Patient**: Large sites take time
+5. **Check Size**: Monitor disk space for big scrapes
+## 🎓 Learning Resources
+- HTTrack Documentation: https://www.httrack.com/html/
+- Gradio Documentation: https://www.gradio.app/docs/
+- Docker Documentation: https://docs.docker.com/
+---
+**Need Help?** Check the troubleshooting section in the main README or review the logs for error messages.
+**Ready to scrape?** Run `./deploy.sh deploy` and start downloading! 🚀