# Epic 4: LLM Summarization & Persistence

**Goal:** Integrate with the configured local Ollama instance to generate summaries for successfully scraped article text and fetched comments. Persist these summaries locally. Implement a stage testing utility for summarization.

## Story List

### Story 4.1: Implement Ollama Client Module

-   **User Story / Goal:** As a developer, I want a client module to interact with the configured Ollama API endpoint via HTTP, handling requests and responses for text generation, so that summaries can be generated programmatically.
-   **Detailed Requirements:**
    -   **Prerequisite:** Ensure a local Ollama instance is installed and running, accessible via the URL defined in `.env` (`OLLAMA_ENDPOINT_URL`), and that the model specified in `.env` (`OLLAMA_MODEL`) has been downloaded (e.g., via `ollama pull model_name`). Instructions for this setup should be in the project README.
    -   Create a new module: `src/clients/ollamaClient.ts`.
    -   Implement an async function `generateSummary(prompt: string, content: string): Promise<string | null>`.
    -   Add configuration variables `OLLAMA_ENDPOINT_URL` (e.g., `http://localhost:11434`) and `OLLAMA_MODEL` (e.g., `llama3`) to `.env.example`. Ensure they are loaded via the config module (`src/config.ts`). Update local `.env` with actual values.
    -   Inside `generateSummary`:
        -   Construct the full prompt string (e.g., `${prompt}\n\n${content}`).
        -   Construct the Ollama API request payload (JSON): `{ model: configured_model, prompt: full_prompt, stream: false }`. Refer to Ollama `/api/generate` documentation.
        -   Use native `Workspace` to send a POST request to the configured Ollama endpoint + `/api/generate`. Set appropriate headers (`Content-Type: application/json`). Set a reasonable timeout (e.g., 1-2 minutes, as LLM generation can be slow).
        -   Handle `Workspace` errors (network, timeout) using `try...catch`.
        -   Check `response.ok`. If not OK, log the status/error and return `null`.
        -   Parse the JSON response from Ollama. Extract the generated text (typically in the `response` field).
        -   Check for potential errors within the Ollama response structure itself.
        -   Return the extracted summary string on success. Return `null` on any failure.
    -   Log key events: initiating request (mention model), receiving response, success, failure reasons, potentially request/response time using the logger.
    -   Define necessary TypeScript types for the Ollama request payload and expected response structure.
-   **Acceptance Criteria (ACs):**
    -   AC1: The `ollamaClient.ts` module exists and exports `generateSummary`.
    -   AC2: `OLLAMA_ENDPOINT_URL` and `OLLAMA_MODEL` are defined in `.env.example`, loaded via config, and used by the client.
    -   AC3: `generateSummary` sends a correctly formatted POST request (model, full prompt, stream:false) to the configured Ollama endpoint/path using native `Workspace`.
    -   AC4: Network errors, timeouts, and non-OK API responses are handled gracefully, logged, and result in a `null` return (given the Prerequisite Ollama service is running).
    -   AC5: A successful Ollama response is parsed correctly, the generated text is extracted, and returned as a string.
    -   AC6: Unexpected Ollama response formats or internal errors are handled, logged, and result in a `null` return.
    -   AC7: Logs provide visibility into the client's interaction with the Ollama API.

---

### Story 4.2: Define Summarization Prompts

-   **User Story / Goal:** As a developer, I want standardized base prompts for generating article summaries and HN discussion summaries, ensuring consistent instructions are sent to the LLM.
-   **Detailed Requirements:**
    -   Define two string constants or configuration variables for the prompts:
        -   `ARTICLE_SUMMARY_PROMPT`: e.g., "Summarize the key points, arguments, and conclusions of the following article text concisely:"
        -   `DISCUSSION_SUMMARY_PROMPT`: e.g., "Summarize the main themes, diverse viewpoints, key insights, and overall sentiment expressed in the following Hacker News comments:"
    -   Store these prompts in a suitable location (e.g., a new `src/summarizer/prompts.ts` module or within `src/config.ts`). Make them easily accessible to the main workflow.
-   **Acceptance Criteria (ACs):**
    -   AC1: The `ARTICLE_SUMMARY_PROMPT` constant/variable is defined with appropriate instructional text.
    -   AC2: The `DISCUSSION_SUMMARY_PROMPT` constant/variable is defined with appropriate instructional text.
    -   AC3: These prompts are exported or otherwise made available for use in the main workflow.

---

### Story 4.3: Integrate Summarization into Main Workflow

-   **User Story / Goal:** As a developer, I want to integrate the Ollama client into the main workflow to generate summaries for each story's scraped article text (if available) and fetched comments.
-   **Detailed Requirements:**
    -   Modify the main execution flow in `src/index.ts`.
    -   Import `ollamaClient.generateSummary` and the defined `ARTICLE_SUMMARY_PROMPT`, `DISCUSSION_SUMMARY_PROMPT`.
    -   Within the main loop iterating through stories (after article scraping/persistence in Epic 3):
    -   **Article Summary Generation:**
        -   Check if the `story` object has non-null `articleContent`.
        -   If yes: log "Attempting article summarization for story {storyId}", call `await generateSummary(ARTICLE_SUMMARY_PROMPT, story.articleContent)`, store the result (string or null) as `story.articleSummary`, log success/failure.
        -   If no: set `story.articleSummary = null`, log "Skipping article summarization: No content".
    -   **Discussion Summary Generation:**
        -   Check if the `story` object has a non-empty `comments` array.
        -   If yes:
            -   Format the `story.comments` array into a single text block suitable for the LLM prompt (e.g., concatenating `comment.text` with separators like `---`). *Note: Be aware of potential LLM context window limits; for MVP, concatenate all fetched comments but log a warning if total character count is very high (e.g., > 10000 characters).*
            -   Log "Attempting discussion summarization for story {storyId}".
            -   Call `await generateSummary(DISCUSSION_SUMMARY_PROMPT, formattedCommentsText)`.
            -   Store the result (string or null) as `story.discussionSummary`. Log success/failure.
        -   If no: set `story.discussionSummary = null`, log "Skipping discussion summarization: No comments".
-   **Acceptance Criteria (ACs):**
    -   AC1: Running `npm run dev` executes steps from Epics 1-3, then attempts summarization using the Ollama client.
    -   AC2: Article summary is attempted only if `articleContent` exists for a story.
    -   AC3: Discussion summary is attempted only if `comments` exist for a story.
    -   AC4: `generateSummary` is called with the correct prompts and corresponding content (article text or formatted comments).
    -   AC5: Logs clearly indicate the start, success, or failure (including null returns from the client) for both article and discussion summarization attempts per story.
    -   AC6: Story objects in memory now contain `articleSummary` (string/null) and `discussionSummary` (string/null) properties.

---

### Story 4.4: Persist Generated Summaries Locally

-   **User Story / Goal:** As a developer, I want to save the generated article and discussion summaries (or null placeholders) to a local JSON file for each story, making them available for the email assembly stage.
-   **Detailed Requirements:**
    -   Define the structure for the summary output file: `{storyId}_summary.json`. Content example: `{ "storyId": "...", "articleSummary": "...", "discussionSummary": "...", "summarizedAt": "ISO_TIMESTAMP" }`. Note that `articleSummary` and `discussionSummary` can be `null`.
    -   Import `fs` and `path` in `src/index.ts` if needed.
    -   In the main workflow loop, after *both* summarization attempts (article and discussion) for a story are complete:
        -   Create a summary result object containing `storyId`, `articleSummary` (string or null), `discussionSummary` (string or null), and the current ISO timestamp (`new Date().toISOString()`).
        -   Get the full path to the date-stamped output directory.
        -   Construct the filename: `{storyId}_summary.json`.
        -   Construct the full file path using `path.join()`.
        -   Serialize the summary result object to JSON (`JSON.stringify(..., null, 2)`).
        -   Use `fs.writeFileSync` to save the JSON to the file, wrapping in `try...catch`.
    -   Log the successful saving of the summary file or any file writing errors.
-   **Acceptance Criteria (ACs):**
    -   AC1: After running `npm run dev`, the date-stamped output directory contains 10 files named `{storyId}_summary.json`.
    -   AC2: Each `_summary.json` file contains valid JSON adhering to the defined structure.
    -   AC3: The `articleSummary` field contains the generated summary string if successful, otherwise `null`.
    -   AC4: The `discussionSummary` field contains the generated summary string if successful, otherwise `null`.
    -   AC5: A valid ISO timestamp is present in the `summarizedAt` field.
    -   AC6: Logs confirm successful writing of each summary file or report file system errors.

---

### Story 4.5: Implement Stage Testing Utility for Summarization

-   **User Story / Goal:** As a developer, I want a separate script/command to test the LLM summarization logic using locally persisted data (HN comments, scraped article text), allowing independent testing of prompts and Ollama interaction.
-   **Detailed Requirements:**
    -   Create a new standalone script file: `src/stages/summarize_content.ts`.
    -   Import necessary modules: `fs`, `path`, `logger`, `config`, `ollamaClient`, prompts.
    -   The script should:
        -   Initialize logger, load configuration (Ollama endpoint/model, prompts, output dir).
        -   Determine target date-stamped directory path.
        -   Find all `{storyId}_data.json` files in the directory.
        -   For each `storyId` found:
            -   Read `{storyId}_data.json` to get comments. Format them into a single text block.
            -   *Attempt* to read `{storyId}_article.txt`. Handle file-not-found gracefully (means article wasn't scraped). Store content or null.
            -   Call `ollamaClient.generateSummary` for article text (if not null) using `ARTICLE_SUMMARY_PROMPT`.
            -   Call `ollamaClient.generateSummary` for formatted comments (if comments exist) using `DISCUSSION_SUMMARY_PROMPT`.
            -   Construct the summary result object (with summaries or nulls, and timestamp).
            -   Save the result object to `{storyId}_summary.json` in the same directory (using logic from Story 4.4), overwriting if exists.
        -   Log progress (reading files, calling Ollama, saving results) for each story ID.
    -   Add script to `package.json`: `"stage:summarize": "ts-node src/stages/summarize_content.ts"`.
-   **Acceptance Criteria (ACs):**
    -   AC1: The file `src/stages/summarize_content.ts` exists.
    -   AC2: The script `stage:summarize` is defined in `package.json`.
    -   AC3: Running `npm run stage:summarize` (after `stage:fetch` and `stage:scrape` runs) reads `_data.json` and attempts to read `_article.txt` files from the target directory.
    -   AC4: The script calls the `ollamaClient` with correct prompts and content derived *only* from the local files (requires Ollama service running per Story 4.1 prerequisite).
    -   AC5: The script creates/updates `{storyId}_summary.json` files in the target directory reflecting the results of the Ollama calls (summaries or nulls).
    -   AC6: Logs show the script processing each story ID found locally, interacting with Ollama, and saving results.
    -   AC7: The script does not call Algolia API or the article scraper module.

## Change Log

| Change                  | Date       | Version | Description                      | Author         |
| ----------------------- | ---------- | ------- | -------------------------------- | -------------- |
| Added Ollama Prereq Note| 2025-05-04 | 0.2     | Added note about local Ollama setup | 2-pm           |
| Initial Draft           | 2025-05-04 | 0.1     | First draft of Epic 4            | 2-pm           |