safebooru 2021.06 rip + addons from yande-re, gelbooru, chan-sankakucomplex, zerochan etc
Category:
Date:
2021-07-16 01:59 UTC
Information:
No information.
Leechers:
1
File size:
202.1 GiB
Completed:
29
Info hash:
40e262cd4d601c68a008474eb46b25044e0c7648
Another **volume V2021B for interval 03.2021-06.2021** in series of composite safebooru-based rips
[12.2020 - 03.2021](https://nyaa.iss.one/view/1376135) volume V2021A - most of it's description replicated here
[09.2020 - 12.2020](https://nyaa.iss.one/view/1340980) volume V2020D
[06.2020 - 09.2020](https://nyaa.iss.one/view/1291697) volume V2020C
[02.2020 - 05.2020](https://nyaa.iss.one/view/1265063) volume V2020B
[08.2019 - 01.2020](https://nyaa.iss.one/view/1227047) volume V2020A
[11.2018 - 08.2019](https://nyaa.iss.one/view/1202653) volume V2019
and some earlier releases of **[BOORU-CHARS OPEN DATASET](https://nyaa.iss.one/view/1384820)**
**This rips not intended to be "complete and maximum quality" but rather "representative the best of" to help users
not to loose interesting fandom, artist or even single prominent picture and get all stuff with several clicks**
Sources used (_priorities high to low when deduplicating_):
* safebooru.org (ID 34xxxxx) **letter S** in archive/folder name
* yande.re (_with some questionable images in separate Q-folders_) **letter Y**
* gelbooru.com (a little bit NSFW in Q) **G**
* anime-pictures.net **A**
* konachan.com (with Q) **K**
* zerochan.net **Z**
* chan.sankakucomplex.com (with Q) **C** impossible to grab at the moment
* e-shuushuu.net **E** will be deprecated next release
**132.643** images sorted and zipped according aspect ratio (dimensions 2 folders) _priorities high to low_ :
- **44253** "artbook pages" **7x10 (+/- 4%)**
- **18643** “wide pages” **3x4 (+/- 10%)**
- **22390** “squares” **1x1 (+/- 20%)**
- **27409** “wallpapers and computer screens” **3x2 (+/- 40%)**
- **19948** "high pages" **2x3 (+/- 40%)** folder name contains 1x2
and also for _**source**_ and (sometimes) _**ID range**_, mentioned in _**folder/archive name**_.
You can browse pictures directly in archives with FastStone MaxView of something like it.
File names structure : **%website% - %id% - %up_to_3_copyrights% ~ %up_to_5_characters% (%up_to_2_artists%).%ext%** where
- %copyright% , %character% and %artist% may be used as filter for search on source booru
- %website% + %id% is unique and also may be used to get direct booru url
so you can extract subsets of interest with xcopy (from already unzipped images) or unzipping (from release on the fly) e.g.
```
for %%F in ("d:\Safebooru 2021b\*.zip") do 7z x -r -o"e:\sortarea\" "%%F" *sword*art*online*
xcopy /s d:\Safebooru 2021b\*sword*art*online* e:\sortarea
```
**Transformations and filters:**
- initially filtered Mpixels >= 1.2, width >= 900, height >= 900
- PNG converted to JPG (quality 94%), no animations
- downsize to 60MPix and/or maxsize 9000 px, stripes dropped or adjusted to aspect ratio 0.4 .. 2.1
- manually (yep, plenty of ~~hand~~job behind this release)
- comic and 4koma, segmented scans and overtexted covers filtered out
- real-life photos, no-character landscapes, most of line-arts and primitive chibi thrown away
- too explicit images (uncensored nipples or vulva, obvious hints on adult actions etc) excluded from "questionable" downloads
- crops done (sometimes as frame splitting) when large simple or dirty background, most artbooks de-bordered
- occationally gamma correction, denoise and other nontrivial improvements made
- carefully deduplicatied (with AntiDupl NET up to 4% similarity) along with past releases
Some meta-information included in tab delimited files :
- **V2021B_files.TSV** post info (size, resolution, MD5 etc) with concatenated copyrights / characters / artists tags (Excel capable)
- **V2021B_tags.TSV** all tags (incl. general and meta) one tag per line (2693574 rows, not fit into Excel)
Using some database you can play with SQL and xcopy (from already unzipped images, copypasting query result) anything you want, e.g.
```
select 'xcopy "d:\'||torr_path||'\'||file_name||'" e:\sortarea ' xc
from files f
join tags t on t.booru=f.booru and t.fid=f.fid
where t.tag='cleft_of_venus' -- hoops ! we are almost there ...
```
**Disclamer** - due to limited time spent on release some features here not so good as it has to be:
- zip sizes bigger than optimal
- PNG original info in post metadata lost
- copyrights/characters-to-franchise relation in tags metadata needs a lot of patches
... diving deep with [neural network](https://github.com/aperveyev/booru_yolo) into [BOORU-CHARS](https://nyaa.iss.one/view/1384820) ...
Comments - 1
SomaHeir