- ✨ Smart Web Capturing - Download complete websites with all resources (images, CSS, JavaScript, fonts)
- 🔄 Multiple Engines - Choose between standard requests, Selenium, or Playwright for perfect captures
- 📚 Bulk Archive - Download multiple websites at once with the batch processor
- 🔍 Content Search - Find exactly what you need with full text search across your archives
- 🏷️ Tagging System - Organize websites with custom tags for efficient categorization
- 📝 Notes & Annotations - Add context with your own notes for each saved website
- ✏️ Built-in Editor - Modify archived content directly within the application
- 📦 Import/Export - Share your archives with others or back them up securely
- Python 3.7 or higher
- PyQt6
- Internet connection for downloading websites
# Install from PyPI
pip install website-archiver
# Launch the application
website-archiver
# Clone the repository
git clone https://github.com/Oliverwebdev/WebArchiver
cd website-archiver
# Create and activate virtual environment (recommended)
python -m venv venv
# On Windows
venv\Scripts\activate
# On macOS/Linux
source venv/bin/activate
# Install requirements
pip install -r requirements.txt
# Run the application
python main.py
For the best archiving experience, install additional engines:
# For Playwright support (recommended for complex websites)
pip install playwright
playwright install chromium
# For Selenium support
pip install selenium
- Launch Website Archiver
- Go to the Download tab
- Enter the URL you want to archive
- Select your preferred download options
- Click Download
- Your archived website will appear in the Home tab
- Search: Use the search bar to find websites by title, URL, or content
- Filter by Tags: Select a tag from the dropdown to filter related websites
- Edit Website: Click "Edit" to modify the website's content, tags, or properties
- Add Notes: Record your thoughts or context about why you archived the site
- Export: Share your archives with others using the export functionality
Visit the Settings tab to configure:
- Storage location for your archives
- Default download engine
- Resource options (images, CSS, JS, fonts)
- Timeout and concurrency settings
- And much more!
Website Archiver intelligently captures web content using a multi-step process:
- Analysis: Evaluates the target website structure
- Download: Retrieves HTML content using the selected engine
- Resource Collection: Gathers linked resources (images, styles, scripts)
- Path Rewriting: Modifies resource paths to work offline
- Storage: Organizes content in a structured filesystem
- Indexing: Catalogs the archive in the searchable database
The application architecture includes:
config_manager.py
: Manages application configurationdatabase_manager.py
: Handles SQLite database operationsscraper.py
: Core web scraping functionalitysession_manager.py
: Manages application state between sessionsui/
: PyQt6-based user interface components
Want to contribute to Website Archiver? Great! We welcome contributions of all kinds.
- Fork the repository
- Clone your fork:
git clone
https://github.com/Oliverwebdev/WebArchiver - Create a virtual environment:
python -m venv venv
- Activate it and install dev dependencies:
pip install -r requirements-dev.txt
- Make your changes and submit a pull request!
This project is licensed under the MIT License - see the LICENSE file for details.
- Beautiful Soup for HTML parsing
- PyQt for the GUI framework
- Requests for HTTP functionality
- Selenium and Playwright for browser automation
- All the open source contributors who made this project possible
If you find Website Archiver useful, please consider:
- Star the repository on GitHub
- Reporting issues or suggesting features
- Contributing code or documentation improvements
- Sharing the project with others
Website Archiver - Because the web is too important to lose.