Convert your genomic data to high-performance Parquet for seamless integration and faster ML pipeline training.
- FASTA Support: Convert FASTA files (.fasta, .fa, .fna) to Parquet format
- Compression Support: Handle both plain and gzipped FASTA files
- Hugging Face Integration: Direct upload to Hugging Face Hub for dataset sharing
- Python Native: Built with Python for easy integration into your workflow
- Type Safety: Full type hints support for better development experience
pip install bio2parquetWith uv:
uv tool install bio2parquet# Basic conversion
bio2parquet fasta input.fasta
# Specify output file
bio2parquet fasta input.fasta -o output.parquet
# Upload to Hugging Face Hub
bio2parquet fasta input.fasta --hf-repo-id username/dataset-name --hf-token your_tokenfrom bio2parquet import create_dataset_from_fasta
# Convert FASTA to Parquet
dataset = create_dataset_from_fasta("input.fasta")
dataset.to_parquet("output.parquet")
# Upload to Hugging Face Hub
dataset.push_to_hub("username/dataset-name", token="your_token")For detailed documentation, visit our documentation site.
We welcome contributions! Please see our Contributing Guide for details.
This project is licensed under the MIT License - see the license file for details.