Threat Synth is a little project that I threw together to crawl various places via rss feeds to gather information from several of my favorite sites for summarized digestable information. This is because this is so much information out there, and I have only so much time. I do not have time to read 10,000 char long articles about a single thing, there are too many things!
How it works
Threat Synth is powered via Python and Ollama (offline AI). Basically, I get a summary of the pages content, and then have AI extract key data points about the page (depending on the page content). This can be challenging, because there are so many different types of content on a site, and no one knows what will be posted, even with tagging. So, I have AI tell me what the content is about first, and then using that data, I know what prompts to use to extract the data. I then inject that datagram (JSON object) into a Markdown Template.
Project Status
Currently the project can gather the data, categorize the data, and generate the Markdown.
TODO:
- Pump those Markdown files into this site (powered by Hugo)
- Have Hugo rebuild the site with the new markdown content
- Push the new site data to a server to update the site
- ???
- PROFIT!
So, can I uh…see some code?
Here is a snippet of some the things. I am still deciding on the license model of this little project. If it’s open source, I’ll make the repo public on my git server.
# have AI figure out what the article type is, and what prompt to use
prompt_to_get = determine_article_type(tech_summary,default_summary_prompt)
# if AI didn't know what prompt to use, or didn't return valid JSON, use the general prompt
if prompt_to_get is None:
logger.warning(f"AI couldn't determine the prompt to use, using default of {default_summary_prompt}")
prompt_file = default_summary_prompt
else:
prompt_file = prompt_to_get[0].get("prompt_to_run")
# summarize with selected prompt
logger.info(f"Analyzing Webpage with Prompt:{prompt_file}. Source: {source_link}")
summary = summarize_entry(tech_summary[0],prompt_file)
if summary is None:
logger.warning("No final summary found. Skipping this article",extra={"link":source_link,"prompt":prompt_file})
continue
# add the prompt file and source to the summary
summary[0].update({"template":prompt_file})
summary[0].update({"source_link":source_link})
# Build a markdown summary for this article
markdown = gen_markdown_for_summary(summary[0],prompt_file)