bio2parquet

Convert your genomic data to high-performance Parquet for seamless integration and faster ML pipeline training.

🚀 Features

FASTA Support: Convert FASTA files (.fasta, .fa, .fna) to Parquet format
Compression Support: Handle both plain and gzipped FASTA files
Hugging Face Integration: Direct upload to Hugging Face Hub for dataset sharing
Python Native: Built with Python for easy integration into your workflow
Type Safety: Full type hints support for better development experience

📦 Installation

pip install bio2parquet

With uv:

uv tool install bio2parquet

🎯 Quick Start

Command Line Interface

# Basic conversion
bio2parquet fasta input.fasta

# Specify output file
bio2parquet fasta input.fasta -o output.parquet

# Upload to Hugging Face Hub
bio2parquet fasta input.fasta --hf-repo-id username/dataset-name --hf-token your_token

Python API

from bio2parquet import create_dataset_from_fasta

# Convert FASTA to Parquet
dataset = create_dataset_from_fasta("input.fasta")
dataset.to_parquet("output.parquet")

# Upload to Hugging Face Hub
dataset.push_to_hub("username/dataset-name", token="your_token")

📚 Documentation

For detailed documentation, visit our documentation site.

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

📄 License

This project is licensed under the MIT License - see the license file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github		.github
config		config
docs		docs
scripts		scripts
src/bio2parquet		src/bio2parquet
tests		tests
.copier-answers.yml		.copier-answers.yml
.envrc		.envrc
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
duties.py		duties.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

bio2parquet

🚀 Features

📦 Installation

🎯 Quick Start

Command Line Interface

Python API

📚 Documentation

🤝 Contributing

📄 License

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

bio2parquet

🚀 Features

📦 Installation

🎯 Quick Start

Command Line Interface

Python API

📚 Documentation

🤝 Contributing

📄 License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages