Clean News — Modus Operandi

Project Links:

Clean News IL (RU): https://t.me/cleannews_il

Author: @gitstashapply

Clean news:

Clean News is a news-curation project that collects articles from publicly accessible sources, removes political and emotional content as well as promotional or non-news material, and provides concise, factual updates in Russian. By leveraging AI-based text processing and minimal manual oversight, each article retains only essential information in a neutral presentation. A direct link to the original source is always included for additional context.


Important Statement:

Public Sources Notice

Source Attribution

State of the product


1. Data Ingestion

2. Preliminary Validation

ShouldSkip Check: A prompt called shouldSkipMessagePrompt determines if the article is valid news or should be skipped.

shouldSkipMessagePrompt

      export const shouldSkipMessagePrompt = {
        role: "system",
        content: `
          You are a text validator. Your task is to determine if the provided text is a valid news article.
          At the end of the text you will receive "Has media: " which represents that text contains media or not.
          A valid news article typically contains information about events, reports, live reports, or announcements that are informative and relevant to the public.
      
          Exclude content that is:
          - General essence of the text is limited advertisements or promotional material
          - Questions or user inquiries
          - Opinions, personal stories, or irrelevant text
          - Spam
      
          Not exclude content:
          - If the text primarily communicates valid news info.
          - If the article contains "Has media: true" and the general essence of the text is not limited to advertising material.
          - If the article contains an advertisement call to action for subscribing, but it is not the main essence of the text.
      
          Return a JSON object with the following schema:
          {
            "skip": 
          }
      
          - Set "skip" to true if the text is NOT a valid news article.
          - Set "skip" to false if the text is a valid news article.
      
          Only return the JSON response without any additional commentary or explanation.
        `,
      };
        

3. Content Filtering & Editing

4. Structured Output

JSON Output: The final JSON must include:

{
  "content": "",
  "trust_factor": "",
  "explanation": ""
}
    

No Extra Data: The output should not include any additional commentary beyond the specified fields.

5. Publication

6. Continuous Improvement