working on proxy-service
This commit is contained in:
parent
b23ca42f4d
commit
953b361d30
14 changed files with 1020 additions and 56 deletions
258
apps/data-service/src/PROXY-SERVICE-README.md
Normal file
258
apps/data-service/src/PROXY-SERVICE-README.md
Normal file
|
|
@ -0,0 +1,258 @@
|
|||
# Proxy Service
|
||||
|
||||
A comprehensive proxy management service for the Stock Bot platform that integrates with existing libraries (Redis cache, logger, http-client) to provide robust proxy scraping, validation, and management capabilities.
|
||||
|
||||
## Features
|
||||
|
||||
- **Automatic Proxy Scraping**: Scrapes free proxies from multiple public sources
|
||||
- **Proxy Validation**: Tests proxy connectivity and response times
|
||||
- **Redis Caching**: Stores proxy data with TTL and working status in Redis
|
||||
- **Health Monitoring**: Periodic health checks for working proxies
|
||||
- **Structured Logging**: Comprehensive logging with the platform's logger
|
||||
- **HTTP Client Integration**: Seamless integration with the existing http-client library
|
||||
- **Background Processing**: Non-blocking proxy validation and refresh jobs
|
||||
|
||||
## Quick Start
|
||||
|
||||
```typescript
|
||||
import { proxyService } from './services/proxy.service.js';
|
||||
|
||||
// Start the proxy service with automatic refresh
|
||||
await proxyService.queueRefreshProxies(30 * 60 * 1000); // Refresh every 30 minutes
|
||||
await proxyService.startHealthChecks(15 * 60 * 1000); // Health check every 15 minutes
|
||||
|
||||
// Get a working proxy
|
||||
const proxy = await proxyService.getWorkingProxy();
|
||||
|
||||
// Use the proxy with HttpClient
|
||||
import { HttpClient } from '@stock-bot/http-client';
|
||||
const client = new HttpClient({ proxy });
|
||||
const response = await client.get('https://api.example.com/data');
|
||||
```
|
||||
|
||||
## Core Methods
|
||||
|
||||
### Proxy Management
|
||||
|
||||
```typescript
|
||||
// Scrape proxies from default sources
|
||||
const count = await proxyService.scrapeProxies();
|
||||
|
||||
// Scrape from custom sources
|
||||
const customSources = [
|
||||
{
|
||||
url: 'https://example.com/proxy-list.txt',
|
||||
type: 'free',
|
||||
format: 'text',
|
||||
parser: (content) => parseCustomFormat(content)
|
||||
}
|
||||
];
|
||||
await proxyService.scrapeProxies(customSources);
|
||||
|
||||
// Test a specific proxy
|
||||
const result = await proxyService.checkProxy(proxy, 'http://httpbin.org/ip');
|
||||
console.log(`Proxy working: ${result.isWorking}, Response time: ${result.responseTime}ms`);
|
||||
```
|
||||
|
||||
### Proxy Retrieval
|
||||
|
||||
```typescript
|
||||
// Get a single working proxy
|
||||
const proxy = await proxyService.getWorkingProxy();
|
||||
|
||||
// Get multiple working proxies
|
||||
const proxies = await proxyService.getWorkingProxies(10);
|
||||
|
||||
// Get all proxies (including non-working)
|
||||
const allProxies = await proxyService.getAllProxies();
|
||||
```
|
||||
|
||||
### Statistics and Monitoring
|
||||
|
||||
```typescript
|
||||
// Get proxy statistics
|
||||
const stats = await proxyService.getProxyStats();
|
||||
console.log(`Total: ${stats.total}, Working: ${stats.working}, Failed: ${stats.failed}`);
|
||||
console.log(`Average response time: ${stats.avgResponseTime}ms`);
|
||||
```
|
||||
|
||||
### Maintenance
|
||||
|
||||
```typescript
|
||||
// Clear all proxy data
|
||||
await proxyService.clearProxies();
|
||||
|
||||
// Graceful shutdown
|
||||
await proxyService.shutdown();
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
The service uses environment variables for Redis configuration:
|
||||
|
||||
```bash
|
||||
REDIS_HOST=localhost # Redis host (default: localhost)
|
||||
REDIS_PORT=6379 # Redis port (default: 6379)
|
||||
REDIS_DB=0 # Redis database (default: 0)
|
||||
```
|
||||
|
||||
## Proxy Sources
|
||||
|
||||
Default sources include:
|
||||
- TheSpeedX/PROXY-List (HTTP proxies)
|
||||
- clarketm/proxy-list (HTTP proxies)
|
||||
- ShiftyTR/Proxy-List (HTTP proxies)
|
||||
- monosans/proxy-list (HTTP proxies)
|
||||
|
||||
### Custom Proxy Sources
|
||||
|
||||
You can add custom proxy sources with different formats:
|
||||
|
||||
```typescript
|
||||
const customSource = {
|
||||
url: 'https://api.example.com/proxies',
|
||||
type: 'premium',
|
||||
format: 'json',
|
||||
parser: (content) => {
|
||||
const data = JSON.parse(content);
|
||||
return data.proxies.map(p => ({
|
||||
type: 'http',
|
||||
host: p.ip,
|
||||
port: p.port,
|
||||
username: p.user,
|
||||
password: p.pass
|
||||
}));
|
||||
}
|
||||
};
|
||||
```
|
||||
|
||||
## Integration Examples
|
||||
|
||||
### With Market Data Collection
|
||||
|
||||
```typescript
|
||||
import { proxyService } from './services/proxy.service.js';
|
||||
import { HttpClient } from '@stock-bot/http-client';
|
||||
|
||||
async function fetchMarketDataWithProxy(symbol: string) {
|
||||
const proxy = await proxyService.getWorkingProxy();
|
||||
if (!proxy) {
|
||||
throw new Error('No working proxies available');
|
||||
}
|
||||
|
||||
const client = new HttpClient({
|
||||
proxy,
|
||||
timeout: 10000,
|
||||
retries: 2
|
||||
});
|
||||
|
||||
try {
|
||||
return await client.get(`https://api.example.com/stock/${symbol}`);
|
||||
} catch (error) {
|
||||
// Mark proxy as potentially failed and try another
|
||||
await proxyService.checkProxy(proxy);
|
||||
throw error;
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Proxy Rotation Strategy
|
||||
|
||||
```typescript
|
||||
async function fetchWithProxyRotation(urls: string[]) {
|
||||
const proxies = await proxyService.getWorkingProxies(urls.length);
|
||||
|
||||
const promises = urls.map(async (url, index) => {
|
||||
const proxy = proxies[index % proxies.length];
|
||||
const client = new HttpClient({ proxy });
|
||||
return client.get(url);
|
||||
});
|
||||
|
||||
return Promise.allSettled(promises);
|
||||
}
|
||||
```
|
||||
|
||||
## Cache Structure
|
||||
|
||||
The service stores data in Redis with the following structure:
|
||||
|
||||
```
|
||||
proxy:{host}:{port} # Individual proxy data with status
|
||||
proxy:working:{host}:{port} # Working proxy references
|
||||
proxy:stats # Cached statistics
|
||||
```
|
||||
|
||||
## Logging
|
||||
|
||||
The service provides structured logging for all operations:
|
||||
|
||||
- Proxy scraping progress and results
|
||||
- Validation results and timing
|
||||
- Cache operations and statistics
|
||||
- Error conditions and recovery
|
||||
|
||||
## Background Jobs
|
||||
|
||||
### Refresh Job
|
||||
- Scrapes proxies from all sources
|
||||
- Removes duplicates
|
||||
- Stores in cache with metadata
|
||||
- Triggers background validation
|
||||
|
||||
### Health Check Job
|
||||
- Tests existing working proxies
|
||||
- Updates status in cache
|
||||
- Removes failed proxies from working set
|
||||
- Maintains proxy pool health
|
||||
|
||||
### Validation Job
|
||||
- Tests newly scraped proxies
|
||||
- Updates working status
|
||||
- Measures response times
|
||||
- Runs in background to avoid blocking
|
||||
|
||||
## Error Handling
|
||||
|
||||
The service includes comprehensive error handling:
|
||||
|
||||
- Network failures during scraping
|
||||
- Redis connection issues
|
||||
- Proxy validation timeouts
|
||||
- Invalid proxy formats
|
||||
- Cache operation failures
|
||||
|
||||
All errors are logged with context and don't crash the service.
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
- **Concurrent Validation**: Processes proxies in chunks of 50
|
||||
- **Rate Limiting**: Includes delays between validation chunks
|
||||
- **Cache Efficiency**: Uses TTL and working proxy sets
|
||||
- **Memory Management**: Processes large proxy lists in batches
|
||||
- **Background Processing**: Validation doesn't block main operations
|
||||
|
||||
## Dependencies
|
||||
|
||||
- `@stock-bot/cache`: Redis caching with TTL support
|
||||
- `@stock-bot/logger`: Structured logging with Loki integration
|
||||
- `@stock-bot/http-client`: HTTP client with built-in proxy support
|
||||
- `ioredis`: Redis client (via cache library)
|
||||
- `pino`: High-performance logging (via logger library)
|
||||
|
||||
## Limitations
|
||||
|
||||
Due to the current Redis cache provider interface:
|
||||
- Key pattern matching not available
|
||||
- Bulk operations limited
|
||||
- Set operations (sadd, srem) not directly supported
|
||||
|
||||
The service works around these limitations using individual key operations and maintains functionality while noting areas for future enhancement.
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- Premium proxy source integration
|
||||
- Proxy performance analytics
|
||||
- Geographic proxy distribution
|
||||
- Protocol-specific proxy pools (HTTP, HTTPS, SOCKS)
|
||||
- Enhanced caching with set operations
|
||||
- Proxy authentication management
|
||||
Loading…
Add table
Add a link
Reference in a new issue