saulteafarmer/rss2podcast

Fork 0

mirror of https://github.com/jooray/rss2podcast.git synced 2025-06-23 16:05:40 +00:00

Go to file

Juraj Bednar 5304baea97 Add recommended platform

2025-05-20 15:47:38 +02:00

static

initial commit

2025-05-20 15:32:43 +02:00

templates

initial commit

2025-05-20 15:32:43 +02:00

.gitignore

initial commit

2025-05-20 15:32:43 +02:00

add_website.py

initial commit

2025-05-20 15:32:43 +02:00

app.py

initial commit

2025-05-20 15:32:43 +02:00

config.json.sample

initial commit

2025-05-20 15:32:43 +02:00

content_processing.py

initial commit

2025-05-20 15:32:43 +02:00

database-tool.py

initial commit

2025-05-20 15:32:43 +02:00

episode_processing_utils.py

initial commit

2025-05-20 15:32:43 +02:00

episode_processor.py

initial commit

2025-05-20 15:32:43 +02:00

episode-tool.py

initial commit

2025-05-20 15:32:43 +02:00

feed_downloader.py

initial commit

2025-05-20 15:32:43 +02:00

feed_generator.py

initial commit

2025-05-20 15:32:43 +02:00

main.py

initial commit

2025-05-20 15:32:43 +02:00

ollama_client.py

initial commit

2025-05-20 15:32:43 +02:00

process_website_queue.py

initial commit

2025-05-20 15:32:43 +02:00

pyproject.toml

initial commit

2025-05-20 15:32:43 +02:00

README-web.md

initial commit

2025-05-20 15:32:43 +02:00

README.md

Add recommended platform

2025-05-20 15:47:38 +02:00

utils.py

initial commit

2025-05-20 15:32:43 +02:00

web_utils.py

initial commit

2025-05-20 15:32:43 +02:00

web-config.json-sample

initial commit

2025-05-20 15:32:43 +02:00

README.md

RSS-to-Podcast Converter

Motivation: Not everyone has time to read, but reading is essential. This tool enables you to listen to what you want to read instead. It has two modes - you can convert your own blog to a podcast, or you can create a personal podcasts from articles from all around the web.

A Python-based application that converts blog posts from an RSS feed into a value-for-value-enabled podcast, allowing listeners to engage with content via audio. This project automates the transformation of blog articles into podcast episodes by using text-to-speech technology, providing an easy way for listeners to consume written content as audio.

Another use-case is converting websites into podcast episodes. They are added manually using add_website.py. In this case, there is no source RSS feed.

For this use-case, there is also a web application, where users can generate their own podcast feed from articles they want to read. It powers loaditfor.me. Feel free to run your own instance, see README-web.md.

Showcase

Podcasts from blogs:

Juraj's blogs - my blogs (I'm Juraj, pleased to meet you)
Liberation travel newsletters - if you are interested about being a global opportunist, listen to this, it is amazing. Links to audio below the list of text issues of the newsletter.

Website:

If you want to generate your own private podcasts from articles you want to read, check out Loaditfor.me

I would appreciate if you used a value4value podcasting 2.0 app such as Fountain.fm to listen to these and contribute some sats over the Lightning network.

Overview

The RSS-to-Podcast Converter pulls articles from an RSS feed, processes the content using a text-to-speech (TTS) model, and generates podcast episodes in MP3 format. The episodes are then assembled into an RSS feed compatible with podcast players, complete with metadata, audio, and descriptions. The generated podcast includes a value-for-value system, enabling micropayments and splits for creators via the Lightning Network.

This project uses a database to track processed episodes and ensure each article is only converted once. It allows manual skipping of articles that may not be suitable for TTS conversion, such as posts with embedded videos or images.

How to run

This tool was tested on Mac with Apple Silicon and local AI models. It uses three types of AI models:

An LLM using ollama to convert a blog into something that is better suited for reading. It deals with bulletpoints and such, so it is more natural. It also uses the LLM to verify that the model did not hallucinate, keeping original if in doubt
A text to speech based on my project markdown2audio and my fork of StyleTTS2 for the rendering. It fixed a few bugs from the original. Note: It can also clone your voice, if you want your blogs to be read in your voice, which is pretty cool.
A speech to text model based on pywhispercpp to verify the generated audio. Yes, even the text to speech sometimes hallucinates, we try again with different settings if it is the case.

I recommend running on Apple Silicon, where there's acceleration for both Ollama LLMs and Whisper. I have not tried it on anything else, but it might work (especially if you point it to another ollama instance over the network).

Features

Automated Podcast Generation: Converts blog articles from an RSS feed into podcast episodes using a TTS model.
Customizable Episode Templates: Configurable episode description templates to link back to the original article.
Configurable Audio Stitching: Customizable introduction and conclusion audio segments for each episode.
Value-for-Value Integration: Supports micropayments with customizable splits, allowing listeners to contribute directly.
Automatic Skipping and Reprocessing: Tracks processed articles to avoid duplicate conversions, with options to reprocess episodes if necessary.
Optional LLM Processing and Verification: Uses an LLM to optimize text for TTS and verifies the content to avoid unsuitable output.
Customizable Output: Allows custom intro, outro, and conversion settings for generated MP3 files.
Manual Episode Addition: Supports adding episodes manually without a source RSS feed.
Flexible Feed Generation: Can regenerate the RSS feed without processing episodes.
Website Content Addition: Adds content from any website URL directly into the database for processing.
Support for SQLite and PostgreSQL: Now supports both SQLite and PostgreSQL databases via SQLAlchemy.

Skipping Podcast Entries

A new configuration option lets you automatically skip episodes whose titles match one or more regular expressions. This is especially useful if, for example, you want to avoid reprocessing content that is already in audio form. To use this feature, add a skip_regexps array to your configuration file. For instance, to skip any episode whose title contains "audio" or "Audio", add the following:

"skip_regexps": [
  "[Aa]udio"
]

When processing the RSS feed, if an episode’s title matches any of these patterns, the episode is immediately marked as skipped and is not converted to speech.

Dependencies

Install dependencies using Poetry:

poetry install
poetry add psycopg2-binary # for postgresql support

Database Options

This application supports both SQLite and PostgreSQL databases via SQLAlchemy.

Configuring the Database

In your configuration file (config.json), you can specify the database connection using the database_url parameter.

SQLite (Default): If you provide a filename in database, the application will use SQLite.
PostgreSQL: If you provide a PostgreSQL connection string (starting postgresql://) in database_url or database, the application will connect to the specified PostgreSQL database.

Example of using PostgreSQL in config.json:

{
  "database_url": "postgresql://user:password@localhost:5432/mydatabase",
  ...
}

Creating Database Tables

You need to create the database tables before running the application. Use the database-tool.py script to create tables.

For SQLite:

python database-tool.py create --db episodes.db

For PostgreSQL:

python database-tool.py create --db postgresql://user:password@localhost:5432/mydatabase

Migrating Between Databases

To migrate data between SQLite and PostgreSQL, use the migrate command in database-tool.py.

Example migrating from SQLite to PostgreSQL:

python database-tool.py migrate --from episodes.db --to postgresql://user:password@localhost:5432/mydatabase

Example migrating from PostgreSQL to SQLite:

python database-tool.py migrate --from postgresql://user:password@localhost:5432/mydatabase --to episodes.db

Configuration

The project uses a JSON configuration file to define input sources, output settings, and TTS processing details. See the sample configuration file (config.json.sample) for details.

Running the Application

When running the application, it will use the database specified in your configuration file.

For example:

python main.py --config config.json

Managing the Database

Use the database-tool.py script to manage your database, including creating tables and migrating data.

Creating Tables

python database-tool.py create --db [database_url_or_filename]

Migrating Data

python database-tool.py migrate --from [source_db_url_or_filename] --to [destination_db_url_or_filename]

Usage

Running the Conversion

The project includes a command-line interface to manage feed processing. Use the following command to start processing the feed:

python main.py --config config.json

Command-Line Options for `main.py`

--config: Path to the configuration JSON file.
--episode-limit: Limit the number of episodes to process.
--episode-guid: Process a specific episode by GUID.
--reprocess: Reprocess episodes that are already marked as processed.
--only-feed: Generate the RSS feed without processing episodes.

Example:

python main.py --config config.json --episode-limit 10 --reprocess

Managing Episodes with `episode-tool.py`

episode-tool.py allows you to manage episodes in the database, including adding new episodes manually.

Adding a New Episode Manually

echo "This is the content of the episode." | python episode-tool.py --new-episode --title "Episode Title" --config config.json

Options:
- --new-episode: Add a new episode to the database.
- --title: Title of the episode (required).
- --guid: GUID for the episode (optional). If not provided, it's generated based on the link or the current date and title.
- --link: Link associated with the episode (optional).
- --description: Description of the episode (optional).
- --date: Publication date of the episode (optional). Defaults to the current date and time.
- --markdown: Content is in Markdown format (default).
- --html: Content is in HTML format.
- --config: Path to the configuration JSON file.

Other Episode Management Commands

List All Episode GUIDs:

python episode-tool.py --list-guids --config config.json

Mark an Episode as Skipped:

python episode-tool.py --guid "episode-guid" --skip --config config.json

Reprocess an Episode:

python episode-tool.py --guid "episode-guid" --reprocess --config config.json

Delete an Episode:

python episode-tool.py --guid "episode-guid" --delete --config config.json

Adding a Website with `add_website.py`

add_website.py allows you to add content from any website URL directly into the database for processing.

Usage

python add_website.py "https://example.com/article" --config config.json

Positional Arguments:
- url: The URL of the website to add.
Options:
- --config: Path to the configuration JSON file (optional, defaults to config.json).
- --db: Database filename or connection string (overrides the one specified in the config file).

Example

python add_website.py "https://example.com/blog-post" --db episodes.db

This command fetches the content and title from the provided URL using the trafilatura library and adds it to the database with the status set to pending. The content will then be processed the next time you run main.py.

Preprocessing with Regular Expressions

You can specify optional preprocessing regular expressions in your configuration file under the preprocess_regexps key. This feature allows you to define an array of regular expressions and their replacements, which will be applied to both the title and content before converting them to speech.

Example Configuration

"preprocess_regexps": [
  {
    "regexp": " 1-2 ",
    "replacement": " one to two "
  },
  {
    "regexp": "\\bAI\\b",
    "replacement": "Artificial Intelligence"
  }
]

Is this free?

This project is free to use, modify, etc. It is a free and open source software.

I invested quite a lot of work into this project and related projects that made speech to text possible. I ask you to leave the generated value4value block intact (you can add your splits via config).

If you found this useful, I appreciate returning the value - pay what it's worth to you.

README.md Unescape Escape

RSS-to-Podcast Converter

Showcase

Overview

How to run

Features

Skipping Podcast Entries

Dependencies

Database Options

Configuring the Database

Creating Database Tables

Migrating Between Databases

Configuration

Running the Application

Managing the Database

Creating Tables

Migrating Data

Usage

Running the Conversion

Command-Line Options for main.py

Managing Episodes with episode-tool.py

Adding a New Episode Manually

Other Episode Management Commands

Adding a Website with add_website.py

Usage

Example

Preprocessing with Regular Expressions

Example Configuration

Is this free?

README.md

Command-Line Options for `main.py`

Managing Episodes with `episode-tool.py`

Adding a Website with `add_website.py`