Sling
Slingdata.ioBlogGithubHelp!
  • Introduction
  • Sling CLI
    • Installation
    • Environment
    • Running Sling
    • Global Variables
    • CLI Pro
  • Sling Platform
    • Sling Platform
      • Architecture
      • Agents
      • Connections
      • Editor
      • API
      • Deploy from CLI
  • Concepts
    • Replications
      • Structure
      • Modes
      • Source Options
      • Target Options
      • Columns
      • Transforms
      • Runtime Variables
      • Tags & Wildcards
    • Hooks / Steps
      • Check
      • Command
      • Copy
      • Delete
      • Group
      • Http
      • Inspect
      • List
      • Log
      • Query
      • Replication
      • Store
      • Read
      • Write
    • Pipelines
    • Data Quality
      • Constraints
  • Examples
    • File to Database
      • Custom SQL
      • Incremental
    • Database to Database
      • Custom SQL
      • Incremental
      • Backfill
    • Database to File
      • Incremental
    • Sling + Python 🚀
  • Connections
    • Database Connections
      • Athena
      • BigTable
      • BigQuery
      • Cloudflare D1
      • Clickhouse
      • DuckDB
      • DuckLake
      • Iceberg
      • MotherDuck
      • MariaDB
      • MongoDB
      • Elasticsearch
      • MySQL
      • Oracle
      • Postgres
      • Prometheus
      • Proton
      • Redshift
      • S3 Tables
      • StarRocks
      • SQLite
      • SQL Server
      • Snowflake
      • Trino
    • Storage Connections
      • AWS S3
      • Azure Storage
      • Backblaze B2
      • Cloudflare R2
      • DigitalOcean Spaces
      • FTP
      • Google Drive
      • Google Storage
      • Local Storage
      • Min.IO
      • SFTP
      • Wasabi
Powered by GitBook
On this page
  • Configuration
  • Properties
  • Output
  • Examples
  • Process Files in Directory
  • Archive Old Files
  • Size-based Processing
  • Notes
  1. Concepts
  2. Hooks / Steps

List

List hooks allow you to retrieve file and directory listings from any supported filesystem connection. This is particularly useful for discovering files, validating directory contents, or preparing for batch operations.

Configuration

- type: list
  location: "aws_s3/path/to/directory"  # Required: Location string
  recursive: false      # Optional: List files/folders recursively (default: false)
  only: files | folders  # Optional: List only files or only folders
  on_failure: abort    # Optional: abort/warn/quiet/skip
  id: my_id           # Optional. Will be generated. Use `log` hook with {runtime_state} to view state.

Properties

Property
Required
Description

location

Yes

recursive

No

Whether to list files recursively in subdirectories (default: false)

on_failure

No

What to do if the listing fails (abort/warn/quiet/skip)

Output

When the list hook executes successfully, it returns the following output that can be accessed in subsequent hooks:

status: success  # Status of the hook execution
result:  # Array of file/directory entries
  - name: "file1.txt"  # Name of the file/directory
    path: "path/to/file1.txt"  # Full path
    location: "my_conn/path/to/file1.txt"  # Location string
    uri: "s3://bucket/path/to/file1.txt"  # Full URI
    is_file: true  # Whether entry is a file
    is_dir: false  # Whether entry is a directory
    size: 1024  # Size in bytes
    created_at: "2023-01-01T00:00:00Z"  # Creation timestamp if available
    created_at_unix: 1672531200  # Creation unix timestamp if available
    updated_at: "2023-01-02T00:00:00Z"  # Last modified timestamp if available
    updated_at_unix: 1672617600  # Last modified unix timestamp if available
path: "path/to/directory"  # The listed path
connection: "aws_s3"  # The connection used

You can access these values in subsequent hooks using the following syntax (jmespath):

  • {state.hook_id.status} - Status of the hook execution

  • {state.hook_id.result} - Array of file/directory entries

  • {state.hook_id.path} - The listed path

  • {state.hook_id.connection} - The connection used

Examples

Process Files in Directory

List files and process them in a group:

hooks:
  pre:
    - type: list
      id: file_list
      location: "aws_s3/data/{run.stream.name}/"
      recursive: true

    - type: group
      loop: state.file_list.result
      steps:
        - type: log
          if: loop_value.is_file
          message: "Processing file: {loop_value.name}"

Archive Old Files

List and archive files older than a certain date:

hooks:
  post:
    - type: list
      id: old_files
      location: "gcs/temp/{run.stream.name}/"
      recursive: true

    - type: group
      loop: state.old_files.result
      steps:
        - type: copy
          if: loop_value.updated_at_unix < timestamp.unix - 7*24*60*60  # 7 days old
          from: "{loop_value.location}"
          to: "gcs/archive/{timestamp.year}/{timestamp.month}/{loop_value.name}"

Size-based Processing

Process files based on their size:

hooks:
  pre:
    - type: list
      id: large_files
      location: "aws_s3/uploads/"

    - type: group
      loop: state.large_files.result
      steps:
        - type: log
          if: loop_value.size > 1024*1024  # > 1MB
          message: "Large file detected: {loop_value.name} ({loop_value.size} bytes)"

Notes

  • Not all filesystems provide all metadata fields

  • Timestamps may be zero if not supported by the filesystem

  • Directory sizes are typically reported as 0

  • The hook will not fail if the path doesn't exist or is empty

PreviousInspectNextLog

Last updated 4 months ago

The string. Contains connection name and path.

location