Testing Guide¶
This document outlines how to test Medium Converter effectively.
Setting Up the Test Environment¶
Prerequisites¶
- Python 3.11 or newer
- Poetry installed
- Git
Installation¶
# Clone the repository if you haven't already
git clone https://github.com/MarcusElwin/medium-converter.git
cd medium-converter
# Install dependencies with development extras
poetry install --with dev
# Activate the virtual environment
poetry shell
Running Tests¶
Running All Tests¶
# Run all tests
pytest
# Run with increased verbosity
pytest -v
# Run with coverage report
pytest --cov=medium_converter
# Generate HTML coverage report
pytest --cov=medium_converter --cov-report=html
Running Specific Tests¶
# Run tests in a specific file
pytest tests/unit/test_models.py
# Run a specific test class
pytest tests/unit/test_models.py::TestArticle
# Run a specific test function
pytest tests/unit/test_models.py::TestArticle::test_article_creation
# Run tests matching a pattern
pytest -k "models or parser"
Test Structure¶
The test suite is organized as follows:
tests/
├── conftest.py # Shared fixtures
├── unit/ # Unit tests
│ ├── test_models.py # Tests for data models
│ ├── test_parser.py # Tests for HTML parsing
│ └── ...
└── integration/ # Integration tests
├── test_fetcher.py # Tests for article fetching
├── test_exports.py # Tests for export formats
└── ...
Test Types¶
Unit Tests¶
Unit tests focus on testing individual components in isolation. These tests are fast and should not make external network calls or depend on external services.
# Example unit test
def test_markdown_exporter_formatting(sample_article):
exporter = MarkdownExporter()
result = exporter.export(sample_article)
assert sample_article.title in result
assert sample_article.author in result
assert "##" in result # Should contain headings
Integration Tests¶
Integration tests verify that different components work together correctly. These tests may involve file system operations or mocked external services.
# Example integration test with mocked response
def test_article_fetch_and_parse(mock_httpx_client):
# Setup mock response
mock_html = "<html><body><article>...</article></body></html>"
mock_httpx_client.get.return_value.text = mock_html
# Run the test
article = fetch_and_parse("https://example.com/article")
# Verify results
assert article.title is not None
assert len(article.content) > 0
End-to-End Tests¶
End-to-end tests verify the whole system works together. These are marked with pytest.mark.e2e and are skipped by default unless explicitly enabled.
@pytest.mark.e2e
def test_full_conversion_process():
# This test actually downloads a public Medium article
result = convert_article_sync(
url="https://medium.com/official-public-test-article",
output_format="markdown",
output_path=None # Return as string
)
assert result is not None
assert len(result) > 100
Test Fixtures¶
Common test fixtures are defined in conftest.py:
@pytest.fixture
def sample_article():
"""Create a sample article for testing."""
return Article(
title="Sample Article",
author="Test Author",
date="2023-01-01",
content=[...],
# ...
)
@pytest.fixture
def mock_httpx_client(monkeypatch):
"""Mock the HTTPX client for testing."""
mock_client = MagicMock()
mock_response = MagicMock()
mock_client.return_value.__aenter__.return_value = mock_client
mock_client.get.return_value = mock_response
monkeypatch.setattr("httpx.AsyncClient", mock_client)
return mock_client
Mocking External Services¶
Mocking HTTP Requests¶
For tests that would normally make HTTP requests, use the responses package to mock responses:
import responses
@responses.activate
def test_fetch_article():
# Mock the HTTP response
responses.add(
responses.GET,
"https://medium.com/test-article",
body="<html><body><article>Test content</article></body></html>",
status=200,
)
# Test the function
html = fetch_article_sync("https://medium.com/test-article")
assert "Test content" in html
Mocking LLM Providers¶
For testing LLM enhancement without making actual API calls:
@pytest.fixture
def mock_llm_client(monkeypatch):
"""Mock LLM client for testing."""
mock_client = MagicMock()
mock_client.generate.return_value = "Enhanced test content"
monkeypatch.setattr(
"medium_converter.llm.providers.get_llm_client",
lambda config: mock_client
)
return mock_client
# Then in your test
def test_enhance_article(sample_article, mock_llm_client):
enhanced = enhance_article_sync(sample_article)
assert "Enhanced test content" in str(enhanced.content)
assert mock_llm_client.generate.called
Test Data¶
Sample test data is stored in the tests/data directory:
tests/data/
├── html/ # Sample HTML files
│ ├── article1.html # Complete article
│ └── paywall.html # Article with paywall
├── responses/ # Sample API responses
└── expected/ # Expected output files
├── markdown/ # Expected Markdown output
└── pdf/ # Expected PDF output
Load test data in tests like this:
import os
def test_parse_article_with_real_html():
# Load test HTML file
html_path = os.path.join(os.path.dirname(__file__), "../data/html/article1.html")
with open(html_path, "r", encoding="utf-8") as f:
html = f.read()
# Test parsing
article = parse_article(html)
assert article.title == "Expected Title"
Writing New Tests¶
Test-Driven Development¶
For new features, consider following test-driven development (TDD):
- Write a failing test that describes the expected behavior
- Implement the minimum code to make the test pass
- Refactor the code while ensuring tests continue to pass
Test Naming Convention¶
Tests should follow this naming convention:
test_[function_name]_[scenario]_[expected_result]
Examples:
- test_parse_article_with_paywall_returns_partial_content
- test_markdown_exporter_with_images_includes_image_links
Testing Complex Async Code¶
Use pytest-asyncio for testing asynchronous code:
import pytest
@pytest.mark.asyncio
async def test_async_function():
result = await async_function()
assert result == expected_value
Test Coverage¶
Aim for high test coverage, particularly for critical code paths:
# Generate coverage report
pytest --cov=medium_converter --cov-report=term-missing
# Generate HTML report
pytest --cov=medium_converter --cov-report=html
The HTML report will be available in the htmlcov directory.
Continuous Integration¶
Tests are automatically run on GitHub Actions for: - Each pull request - Commits to the main branch
The CI pipeline runs: 1. Linting with ruff and black 2. Type checking with mypy 3. Unit tests 4. Integration tests 5. Coverage reporting
Test Best Practices¶
- Keep tests fast: Slow tests discourage frequent testing
- One assertion per test: Tests should verify one specific behavior
- Use descriptive test names: Names should explain the test's purpose
- Don't test implementation details: Test behavior, not how it's implemented
- Don't skip tests without a good reason: Skipped tests often indicate problems
- Run tests often: Catch issues early
- Avoid test interdependency: Tests should be able to run in any order
Troubleshooting Tests¶
Common Issues¶
- Flaky tests: If tests sometimes fail, they may have timing issues or external dependencies
- Slow tests: Look for unnecessary I/O or processing in tests
- Failing tests: Use
pytest -v --tb=nativefor better error information