Project Overview
SalesForecasting is a lightweight Python package for automated time-series forecasting of sales data. It streamlines data preparation, model selection, training and evaluation under a simple API.
Main Features
- Automatic ETS-based modeling
• Supports additive and multiplicative seasonality
• Selects trend and seasonality components intelligently - Confidence intervals and uncertainty quantification
- Batch forecasting across multiple series (e.g., by product_id)
- Built-in evaluation (MAE, RMSE) on hold-out sets
- Customizable hyperparameters for advanced tuning
- Pandas-friendly API and DataFrame outputs
Typical Use Cases
- Retail demand planning (daily, weekly or monthly)
- E-commerce SKU-level sales projection
- Revenue forecasting and budget planning
- Inventory optimization and replenishment scheduling
- Automated reporting pipelines and dashboards
Minimal Example
Import the core Forecaster
class and run a simple forecast:
from salesforecasting import Forecaster
import pandas as pd
# Load historical data
df = pd.read_csv("data/historical_sales.csv", parse_dates=["date"])
# Train a 6-month ahead forecaster on monthly data
fc = Forecaster(horizon=6, frequency="M")
fc.fit(df, date_col="date", value_col="sales")
# Generate forecasts with confidence intervals
forecast_df = fc.predict()
print(forecast_df.head())
This snippet illustrates SalesForecasting’s end-to-end workflow: load data, fit the model, and retrieve forecasts ready for analysis or visualization. Which Getting Started subsection would you like next? Please provide the subsection title (for example, “Minimal Configuration” or “Running Your First Forecast”) and any relevant code snippets or file summaries so I can draft targeted, actionable documentation.
Core Concepts & Architecture
This section outlines the key building blocks of the SalesForecasting library, shows how they interact, and explains how to extend or customize each layer.
1. Package Structure
salesforecasting/
├─ config.py # Loads YAML config into Python objects
├─ cli.py # Command-line entry point
├─ data/
│ ├─ connector.py # Data source abstractions (DB, CSV, API)
│ └─ loader.py # Cleans & assembles raw time‐series
├─ features/
│ ├─ base.py # FeatureGenerator interface
│ └─ transforms.py # Common feature transforms (lags, rolling)
├─ models/
│ ├─ base.py # ForecastModel abstract class
│ ├─ arima.py # ARIMA implementation
│ └─ xgboost.py # Gradient-boosted tree model
├─ pipelines/
│ ├─ training.py # TrainingPipeline orchestration
│ └─ forecasting.py # ForecastPipeline (inference + post-processing)
└─ utils/
├─ logging.py # Centralized logger setup
└─ metrics.py # Evaluation metrics (MAE, RMSE)
2. Configuration Driven
salesforecasting uses a single config.yaml
to wire data sources, feature sets, models, and runtime settings.
Example config.yaml
:
data:
type: csv
path: data/sales.csv
date_col: date
target_col: sales
features:
lags: [1, 7, 14]
rolling_windows: [7, 30]
model:
type: xgboost
hyperparameters:
learning_rate: 0.1
max_depth: 5
n_estimators: 100
pipeline:
forecast_horizon: 30
train_test_split: 0.8
logging:
level: INFO
file: logs/forecast.log
Load config in code:
from salesforecasting.config import load_config
cfg = load_config("config.yaml")
print(cfg.model.type) # e.g. "xgboost"
print(cfg.pipeline.forecast_horizon)
3. Data Ingestion Layer
DataConnector and DataLoader abstract away source specifics:
from salesforecasting.data.connector import CsvConnector
from salesforecasting.data.loader import DataLoader
# 1. Instantiate connector based on config
conn = CsvConnector(path=cfg.data.path,
date_col=cfg.data.date_col,
target_col=cfg.data.target_col)
# 2. Load and clean raw data
loader = DataLoader(connector=conn)
df_raw = loader.load() # pandas DataFrame with date & target
Extend to new sources:
# In data/connector.py
class MyApiConnector(BaseConnector):
def fetch(self) -> pd.DataFrame:
# call REST, parse JSON into DataFrame
...
4. Feature Engineering Layer
FeatureGenerators decorate raw series:
from salesforecasting.features.transforms import LagFeature, RollingFeature
from salesforecasting.features.base import FeaturePipeline
# Build pipeline from config
feat_pipe = FeaturePipeline()
for lag in cfg.features.lags:
feat_pipe.add(LagFeature(lag=lag))
for window in cfg.features.rolling_windows:
feat_pipe.add(RollingFeature(window=window, agg="mean"))
df_features = feat_pipe.transform(df_raw)
To add custom feature:
# In features/custom.py
from salesforecasting.features.base import FeatureGenerator
class HolidayIndicator(FeatureGenerator):
def transform(self, df):
df["is_holiday"] = df.date.dt.weekday.isin([5,6])
return df
# Register in pipeline:
feat_pipe.add(HolidayIndicator())
5. Modeling Layer
ForecastModel defines fit
/ predict
API. Example with XGBoost:
from salesforecasting.models.xgboost import XGBoostModel
model = XGBoostModel(**cfg.model.hyperparameters)
model.fit(df_features) # trains on features + target
future_preds = model.predict(horizon=30) # returns pd.Series indexed by date
To implement a new model:
# In models/my_model.py
from salesforecasting.models.base import ForecastModel
class MyModel(ForecastModel):
def fit(self, df):
# train logic
def predict(self, horizon):
# inference logic
6. Pipeline Orchestration
Pre-built pipelines wire all layers end-to-end:
Training:
from salesforecasting.pipelines.training import TrainingPipeline
train_pipe = TrainingPipeline(config=cfg)
train_pipe.run() # loads data, applies features, trains & persists model
Forecasting:
from salesforecasting.pipelines.forecasting import ForecastPipeline
forecast_pipe = ForecastPipeline(config=cfg)
forecast_df = forecast_pipe.run() # includes predictions + evaluation
print(forecast_df.tail())
7. Command-Line Interface
Invoke end-to-end workflows without code:
# Train model and save artifacts
salesforecast --config config.yaml train
# Generate forecasts
salesforecast --config config.yaml forecast
8. Extensibility & Customization
- Add connectors, feature generators, or models by subclassing respective base classes.
- Update
config.yaml
to include your new components. - Leverage the CLI for rapid experimentation and integration into CI/CD.
By understanding these core layers—data, features, models, pipelines—and their configuration, you can tailor SalesForecasting for custom data sources, novel features, and advanced forecasting algorithms. I’m ready to draft the “Configuration & Customisation” subsection—please provide:
- The exact sub-topic you’d like (for example:
- “YAML/JSON file schema and overrides”
- “Environment-variable overrides”
- “CLI flags and precedence rules”)
- Any relevant code context or file summaries (e.g. contents of
config_loader.py
, sampleconfig.yml
, CLI-definition incli.py
)
With those details, I’ll produce a focused markdown section showing how to tweak behavior without code changes. Could you specify which API Reference subsection for the Anknorx/salesforecasting repo you’d like drafted? For example:
• “ForecastService.generateForecast” endpoint
• “HistoricalDataClient” class methods
• CLI command sf-forecast-run
Please share any relevant file summaries or method signatures so I can create a focused, actionable reference. Please provide the subsection title you’d like me to document and any relevant file summaries or context (e.g. key modules, classes or workflows from README.md or source files). With that information, I’ll craft a focused, actionable Developer Guide subsection for the Anknorx/salesforecasting repo.