Conversation
|
|
||
| jobs: | ||
| # Free disk space from the github ubuntu image, as is very bloated (Android and DotNet are safe and about ~11G) | ||
| # Task is overlaped with other runs but should not cause a problem |
There was a problem hiding this comment.
instead I would add needs: free-disk to ensure that this job is run before the others
otherwise if for whatever reason this one does not start, then the others might randomly fail
see https://docs.github.com/en/actions/using-workflows/about-workflows#creating-dependent-jobs
There was a problem hiding this comment.
Sadly is more complex...
It appears that every job runs in an independent worker ( which is tied to a different ubuntu-latest image) which means that changes in the worker do not propagate to other jobs, being dependents or not.
So basically when I run this:
1 - Workflow starts
2 - free-disk and ubuntu-fulldbg-gcc runs, each in a different worker
3 - free-disk ends and its worker ends with it, so no changes in disk
With the needs:
1 - Workflow starts
2 - free-disk starts, with its assigned worker, ubuntu-fulldbg-gcc gets into queue
3 - free-disk ends and its worker ends with it. ubuntu-fulldbg-gcc starts with another worker.
4 - We end up in the same place.
By manually deleting those libs the problem its solved, but there is a detail which is preventing it to work. The rationale behind separating the free-disk step into a different job, is that compilation jobs are running inside custom containers, so I don't have direct access to the worker.
As the jobs do not share worker, and the jobs that need more space run in a custom container, it is impossible to liberate space normally.
At this points I am trying several solutions:
1 - Trim the KratosUbuntu-22.04 image with less software ( difficult, as the main offender here is Intel's OneApi 8G, 90% of the image size, and we need it )
2 - Divide the compilation in substeps and clear the intermediate compilation artifacts
3 - Manually mount the docker image and create a custom entry point that executes. Feasible but a tone of work and will prevent us from splinting the job in different steps.
4 - Mounting the host runner directories that we want to delete in the container, and delete its contents. ( God forgive me for suggesting this)
5 - Pray for github to implement pre-step and post-step. (Spoiler: not gonna happen)
I am trying to implement 2, but if not the case we will have to think the whole process differently....
There was a problem hiding this comment.
Ah so it is kinda a coincidence that the current solution allows to run the free-disk job on the same worker?
Regarding your options, I already optimized the image quite a bit regarding size
Also I tried to use custom entry points some time ago and it was not (yet?) supported
So if it works, then kudos for figuring it out
There was a problem hiding this comment.
No no, this solution does not allow for that, they are different workers ( is still failing in this PR)
Custom entry points are ok now, but then you can only do 1 job, or multiple but then you have to handle the volumes manually to transfer data and turn up the container for each job, so I assume it will end up being slow... but yea, its possible now and that's my backup solution if I cannot reorganize the compilation so we can delete artifacts while compiling.
There was a problem hiding this comment.
We could split the image into two, one base image with all the compilers and one with the intel stuff
The intel build only has the core atm, which takes less space
Only downside is that the gcc builds would not have Pardiso enabled
There was a problem hiding this comment.
Also there are multiple versions of mmg and parmmg installed
@loumalouomega @marcnunezc can we remove the older versions to safe space?
There was a problem hiding this comment.
Hi!
Currently, the unused versions are (which can be safely removed):
Kratos/scripts/docker_files/docker_file_ci_ubuntu_22_04/DockerFile
Lines 67 to 74 in f72bc3d
Kratos/scripts/docker_files/docker_file_ci_ubuntu_22_04/DockerFile
Lines 85 to 94 in f72bc3d
These were part of an upgrade I did not finish in #10477 since some tests were giving problems. Also, a new version of MMG was released so it will be better to start all over again with the more recent releases.
|
Do we even need We don't compile or test any applications with the intel compiler, and completely disregard Core test failures in the CI, so it seems to me that we basically don't support ICC. Take a look at this intel CI run as an example. As a sidenote, the Kratos import prompt seems to be broken for ICC: it thinks it was compiled with clang: |
|
Disclaimer: this might not be too relevant to the problem at hand, but it's worth noting anyway. I took a look at file sizes in our repo, and found that the worst offenders are: That's right: none of the 10 largest files in our repository is an To put this into perspective, the entirety of To take a look at build times, I compiled Putting it into perspective: these 3 source files take (a lot) longer to compile than the entirety of |
yes we need it, Intel is essential on clusters, and other proprietary hardware. |
yes we need to fix this, its bcs intel uses clang or gcc internally |
Yes, the contact conditions are AD. The main problem is the number of derivatives terms and how they interact between them. |
Alright, but what about failing core tests? Even though some tests fail, the intel CI job is always marked successful. Part of the CI run I linked earlier: Just asking because I don't have much experience with targeting intel clusters specifically: what's the performance gap between the intel compiler and GCC/Clang? I've read that numerics used to be much faster on ICC (~10 years ago) but nowadays that's no longer the case.
Doesn't pardiso have a C interface? I though it didn't matter what I compiled my app with if I'm just linking to it. |
The problem are not the source files. Trying to solve the problem with those is like trying to drain the sea with a spoon. Main problem are the compilation artifacts. For example, taking Fluid + Structural with Debug (FullDebug are larger but don't have the files here): If you want a more detailed view per app, things look like this (for the build folder, that corresponds to the 4.4G above): Easy to see where the problem comes from if you scale this to ~30 apps. Having a conservative 500Mb per app that 15G of compilation artifacts and put another 5G in terms of shared libs. So total of
True, but I think the problem is more in the lines of not having an alternative to ICC in Intel clusters (If they are from intel, my experience is that you have gcc 4.8.5 or 5 if you are lucky, icc and nothing else) |
|
I have an idea how to remove the intel stuff from docker, will try after work |
That's true, and I wouldn't dare propose people use shorter class/function/variable names and write more concise code just to reduce size. That's the reason for my disclaimer in the beginning; I just wanted to share what I found while digging through our repo.
I'll just point out that artifact sizes are directly influenced by the sizes of translation units, so chunky source files necessarily lead to fat object files and library sizes. The rampant tendency to implement everything in headers probably doesn't help either. Is there any chance of getting a privately hosted CI? I don't know much about the administration of Kratos, but it seems to me that anything we do to fix this issue will be a mere temporary solution if Kratos continues expanding as a project. |
|
Github offers to use privately hosted runners But then someone has to pay and maintain it... We started setting this up only few months before GitHub launched the actions, but then quickly abandoned it due to the overhead. I think in the short/mid-term it makes sense to improve the CI (e.g. not always build everything, doing it in steps etc...) |
Also, security is a nightmare, since basically you are opening the doors to whatever machine you are using to host the runner. |
|
Technically we can also pay GitHub to use their runners (faster, more memory etc), but I don't they are cheap. |
No description provided.