namelessai commited on
Commit
58e33be
Β·
verified Β·
1 Parent(s): d8efa31

Create QUICKSTART.md

Browse files
Files changed (1) hide show
  1. QUICKSTART.md +155 -0
QUICKSTART.md ADDED
@@ -0,0 +1,155 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Quick Start Guide
2
+
3
+ Get the HTTrack Website Scraper running in under 5 minutes!
4
+
5
+ ## πŸš€ Fastest Way (Docker)
6
+
7
+ ```bash
8
+ # 1. Make the deploy script executable
9
+ chmod +x deploy.sh
10
+
11
+ # 2. Deploy everything
12
+ ./deploy.sh deploy
13
+
14
+ # 3. Open your browser
15
+ # Go to: http://localhost:7860
16
+ ```
17
+
18
+ That's it! πŸŽ‰
19
+
20
+ ## πŸ“ Manual Docker Setup
21
+
22
+ If you prefer step-by-step:
23
+
24
+ ```bash
25
+ # Build the image
26
+ docker build -t httrack-scraper .
27
+
28
+ # Run the container
29
+ docker run -d -p 7860:7860 --name httrack-scraper httrack-scraper
30
+
31
+ # Access at http://localhost:7860
32
+ ```
33
+
34
+ ## 🐍 Local Python Setup
35
+
36
+ Without Docker:
37
+
38
+ ```bash
39
+ # 1. Install HTTrack
40
+ sudo apt-get install httrack # Ubuntu/Debian
41
+ brew install httrack # macOS
42
+
43
+ # 2. Install Python packages
44
+ pip install -r requirements.txt
45
+
46
+ # 3. Run the app
47
+ python app.py
48
+
49
+ # 4. Open http://localhost:7860
50
+ ```
51
+
52
+ ## 🎯 First Scrape
53
+
54
+ 1. Enter a URL: `https://example.com`
55
+ 2. Set Max Depth: `2` (good starting point)
56
+ 3. Click "Start Scraping"
57
+ 4. Watch the progress and logs
58
+ 5. Download your ZIP file when complete
59
+
60
+ ## βš™οΈ Recommended Settings for First Time
61
+
62
+ | Setting | Value | Why |
63
+ |---------|-------|-----|
64
+ | Max Depth | 2-3 | Fast, won't download too much |
65
+ | Max Rate | 500000 | Respectful to servers |
66
+ | Respect robots.txt | βœ… Checked | Ethical scraping |
67
+
68
+ ## πŸ› οΈ Common Commands
69
+
70
+ ```bash
71
+ # View logs
72
+ docker logs -f httrack-scraper
73
+
74
+ # Stop the app
75
+ docker stop httrack-scraper
76
+
77
+ # Restart the app
78
+ docker restart httrack-scraper
79
+
80
+ # Or use the deploy script
81
+ ./deploy.sh logs # View logs
82
+ ./deploy.sh stop # Stop app
83
+ ./deploy.sh restart # Restart app
84
+ ./deploy.sh clean # Remove everything
85
+ ```
86
+
87
+ ## ⚠️ Before You Start
88
+
89
+ **Important Reminders:**
90
+ - βœ… Only scrape websites you have permission to access
91
+ - βœ… Respect robots.txt files
92
+ - βœ… Be aware of copyright laws
93
+ - βœ… Use reasonable rate limits
94
+ - βœ… Check the website's Terms of Service
95
+
96
+ **Good Use Cases:**
97
+ - Backing up your own website
98
+ - Archiving with permission
99
+ - Research projects (with authorization)
100
+ - Personal offline browsing (legally obtained content)
101
+
102
+ ## πŸ› Quick Troubleshooting
103
+
104
+ ### Port Already in Use
105
+ ```bash
106
+ # Use a different port
107
+ docker run -d -p 8080:7860 --name httrack-scraper httrack-scraper
108
+ # Then visit http://localhost:8080
109
+ ```
110
+
111
+ ### HTTrack Not Found (Local Installation)
112
+ ```bash
113
+ # Install it first
114
+ sudo apt-get update
115
+ sudo apt-get install httrack
116
+ ```
117
+
118
+ ### Container Won't Start
119
+ ```bash
120
+ # Check the logs
121
+ docker logs httrack-scraper
122
+
123
+ # Try rebuilding
124
+ docker stop httrack-scraper
125
+ docker rm httrack-scraper
126
+ docker build -t httrack-scraper .
127
+ docker run -d -p 7860:7860 --name httrack-scraper httrack-scraper
128
+ ```
129
+
130
+ ## πŸ“š Next Steps
131
+
132
+ - Read the full [README.md](README.md) for detailed documentation
133
+ - Experiment with different depth settings
134
+ - Try scraping progressively larger sites
135
+ - Check out the logs to understand what's happening
136
+
137
+ ## πŸ’‘ Tips for Better Results
138
+
139
+ 1. **Start Small**: Test with depth 1-2 first
140
+ 2. **Increase Gradually**: Only go deeper if needed
141
+ 3. **Monitor Logs**: Watch for errors or issues
142
+ 4. **Be Patient**: Large sites take time
143
+ 5. **Check Size**: Monitor disk space for big scrapes
144
+
145
+ ## πŸŽ“ Learning Resources
146
+
147
+ - HTTrack Documentation: https://www.httrack.com/html/
148
+ - Gradio Documentation: https://www.gradio.app/docs/
149
+ - Docker Documentation: https://docs.docker.com/
150
+
151
+ ---
152
+
153
+ **Need Help?** Check the troubleshooting section in the main README or review the logs for error messages.
154
+
155
+ **Ready to scrape?** Run `./deploy.sh deploy` and start downloading! πŸš€