Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase Performance with Vectorized Memory Access #226

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

ByLamacq
Copy link

Hello,

I changed global memory access from scalar to vector.

Plateform : Ubuntu 16.04, GTX 1050ti, Cuda 10.1 (Up : Original)
GTX1050ti_MasterVsUint4_web

Plateform : Ubuntu 18.04, RTX 2080, Cuda 10.2 (Up : Original)
RTX2080_MasterVsUint4

Best regards,
ByLamacq

@ByLamacq ByLamacq changed the title Increase Performance with Global Vectorized Memory Access Increase Performance with Vectorized Memory Access Mar 19, 2020
@kpot87
Copy link

kpot87 commented Apr 7, 2020

Have you finds anything on 2080? How much have you checked?

@ByLamacq
Copy link
Author

Avez-vous trouvé quelque chose en 2080? Combien avez-vous vérifié?

I'm not looking so I can't find anything. It's just for programming challenge...

@hamnaz
Copy link

hamnaz commented Jun 21, 2020

could you help out in add these features,
currently bitcrack running features like id stride is 100
count+stride+count+stride = 1+100+1+100 = total 202
looking update with new switch --count as define by user ( --count 200) and stride 100 ( user count is checking keys)
user-count + stride + user-count+ stride + user- count = 300+100+300+100+300 = total 1100

addons if --keyspace is 1:3000, new switch --loop --count 2000 --stride 100
user-count + stride + user-count+ stride + user- count = 2000+100+2000(its reach at end but still countin in loop from 1(startkey))+100+2000 continue loop
--count will be keys need to be check and stride
hope this feature will make bitcrack more effective and attractive
Thankx

@marcelosantoto
Copy link

Hello good morning, I want to know if I put several video cards on the same computer to give you an example 4 video cards, these 4 video cards when running the program would have greater power and speed or not? I await your comments.

@marssystems
Copy link

Yes - you will have greater power and speed.

@marcelosantoto
Copy link

Sí, tendrás mayor potencia y velocidad.

First of all, thank you very much for your answer and other questions and the video cards can be any model, for example gtx 1080ti 11GB, some 2 video cards and adding rx 580 8GB about 3 video cards and adding rtx 3060TI 8GB, I would have no problems or have to be all the same models and nvidia or AMD ?, I await your answer.

@marssystems
Copy link

Yes - they can be any Nvidia cards. I don't know about AMD cards.
I use Windows 10 and Nvidia cards with no problems.

@marcelosantoto
Copy link

Sí, pueden ser cualquier tarjeta Nvidia. No sé acerca de las tarjetas AMD.
Utilizo tarjetas Windows 10 y Nvidia sin problemas.

Again thank you very much for responding and I will see to incorporate more video cards then to achieve greater power and speed, I ask you, what video cards do you use? Have you tried the Nvidia GTX, RTX or QUADDRO? Which ones do you recommend using?

@marssystems
Copy link

I use 12 Nvidia P106-100 mining cards and 2 Nvidia Tesla K80's.

@marcelosantoto
Copy link

I use 12 Nvidia P106-100 mining cards and 2 Nvidia Tesla K80's.

were you lucky to use so much power and speed with Bitcarck?

@marssystems
Copy link

Not yet - I just started.

@marcelosantoto
Copy link

Utilizo 12 tarjetas de minería Nvidia P106-100 y 2 Nvidia Tesla K80.

¿Tuviste suerte de usar tanta potencia y velocidad con Bitcarck?

are they on 2 separate PCs or 1? as if it were a mining rig?

@marssystems
Copy link

They are on one PC - an old converted mining rig.

@Uzlopak
Copy link

Uzlopak commented May 22, 2021

@ByLamacq Is this a patch, which could be also applied to OpenCL? Or is this a CUDA-specific optimization?

@BitCrackEvo
Copy link

It's not really a patch but yes it's can be apply to Opencl. Amd gpu have also specific microcode for vector data load. So i think this change in cl code can increase performance.

@Uzlopak
Copy link

Uzlopak commented May 23, 2021

@BitCrackEvo
My C and C++ skills are limited. Are you skilled to implement this?

@BitCrackEvo
Copy link

@Uzlopak
I will try this later but there is many change to do.

Actualy, opencl read an array of structure :
typedef struct {
uint v[8];
}uint256_t;

It's not very good but OpenCl is a high language of programmation so it's depend on the implementation by the compilator...

But, you can also try to do that yourself... It's a good training to upgrade your skills.
I will try after my own project about BitCrack. Sorry.

@Uzlopak
Copy link

Uzlopak commented May 24, 2021

Hi @BitCrackEvo

I started to dig deeper. Very interesting. Can you Help me with this question on stack overflow?

https://stackoverflow.com/questions/67667314/transform-native-c-matrix-multiplication-to-opencl-simd-matrix-multiplication?r=SearchResults

@sigkill
Copy link

sigkill commented Feb 3, 2022

This boosted my Jetson Nano about 20% faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants