Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix for Default service config not honored if initial name resolution fails #11368

Open
wants to merge 43 commits into
base: master
Choose a base branch
from

Conversation

vinodhabib
Copy link

Fix for defect - #11040

Copy link

linux-foundation-easycla bot commented Jul 10, 2024

@kannanjgithub
Copy link
Contributor

kannanjgithub commented Jul 19, 2024 via email

@Alam-Saran
Copy link

When you say not working, which part is not working? Did you add any test assertions and did they fail? Also what cases are the 3 scenarios you listed each for?

On Fri, Jul 19, 2024 at 2:49 PM Alam-Saran @.> wrote: @.* commented on this pull request. ------------------------------ In core/src/test/java/io/grpc/internal/ManagedChannelImplTest.java <#11368 (comment)>: > @@ -4476,6 +4476,38 @@ public String getDefaultScheme() { } } + @test @kannanjgithub https://github.com/kannanjgithub Any further updates on the above. — Reply to this email directly, view it on GitHub <#11368 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHB2YHABPEGDPVE7HYJHPELZNDKYTAVCNFSM6AAAAABKUR6Q6GVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCOBXG43TAOBRGU . You are receiving this because you were mentioned.Message ID: </pull/11368/review/2187770815 @.*** com>

none of the scenarios mentioned above are not going to our flow where we added applying the default service config change on Error.
As of now we have not added any assertions because its not calling our change.

@kannanjgithub
Copy link
Contributor

kannanjgithub commented Jul 22, 2024 via email

@shivaspeaks
Copy link
Member

When you say not working, which part is not working? Did you add any test assertions and did they fail? Also what cases are the 3 scenarios you listed each for?

On Fri, Jul 19, 2024 at 2:49 PM Alam-Saran @.> wrote: _@**._* commented on this pull request. ------------------------------ In core/src/test/java/io/grpc/internal/ManagedChannelImplTest.java <#11368 (comment)>: > @@ -4476,6 +4476,38 @@ public String getDefaultScheme() { } } + @test @kannanjgithub https://github.com/kannanjgithub Any further updates on the above. — Reply to this email directly, view it on GitHub <#11368 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHB2YHABPEGDPVE7HYJHPELZNDKYTAVCNFSM6AAAAABKUR6Q6GVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDCOBXG43TAOBRGU . You are receiving this because you were mentioned.Message ID: </pull/11368/review/2187770815 _@_.* com>

none of the scenarios mentioned above are not going to our flow where we added applying the default service config change on Error. As of now we have not added any assertions because its not calling our change.

I did not completely understand the issue here, but on a high level I assume you're not able to call onError() method. Maybe you can try inducing error using Mockito and then verify.
As this is a unit test and we need line coverage here that it works properly in onError() method. Case would have been different had this been an integration test.

@kannanjgithub
Copy link
Contributor

It looks like it got stuck. Can you manually trigger the checks by pushing an empty commit?

git commit --allow-empty -m "Trigger Build"

@vinodhabib
Copy link
Author

@kannanjgithub Still looks same after empty commit also

@ejona86 ejona86 added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Aug 7, 2024
@grpc-kokoro grpc-kokoro removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Aug 7, 2024
@kannanjgithub
Copy link
Contributor

All checks have passed now. Please squash all commits into one using the 'Squash and merge' option. In the squashed commit change the commit title from "Fix for Default service config..." to a more meaningful "Use default service config if initial name resolution fails #11368".

@vinodhabib
Copy link
Author

vinodhabib commented Aug 8, 2024

@kannanjgithub Looks like i don't have write access to merge the PR as its saying the same and don't see merge button as well.
not sure do i need any additional permission required here. please suggest.

@kannanjgithub
Copy link
Contributor

I remembered only the repo maintainers have the merge access, so we can do it for you.
I also would like @ejona86 to approve the PR since he has also reviewed this PR before.

@vinodhabib
Copy link
Author

@ejona86 Request you to do the needful on this PR

@@ -1840,6 +1841,13 @@ private void handleErrorInSyncContext(Status error) {
if (NameResolverListener.this.helper != ManagedChannelImpl.this.lbHelper) {
return;
}
// Apply Default Service Config if initial name resolution fails.
if (lastServiceConfig == EMPTY_SERVICE_CONFIG && defaultServiceConfig != null) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we initialize things to EMPTY_SERVICE_CONFIG, but we also use it after the initial resolution. And defaultServiceConfig != null shouldn't be relevant.

When we receive addresses but an invalid service config, we use do check whether there is a defaultServiceConfig. But if serverConfigError == null, then we still use EMPTY_SERVICE_CONFIG.

} else if (defaultServiceConfig != null) {
effectiveServiceConfig = defaultServiceConfig;
realChannel.updateConfigSelector(effectiveServiceConfig.getDefaultConfigSelector());
channelLogger.log(
ChannelLogLevel.INFO,
"Received no service config, using default service config");
} else if (serviceConfigError != null) {
if (!serviceConfigUpdated) {
// First DNS lookup has invalid service config, and cannot fall back to default
channelLogger.log(
ChannelLogLevel.INFO,
"Fallback to error due to invalid first service config without default config");
// This error could be an "inappropriate" control plane error that should not bleed
// through to client code using gRPC. We let them flow through here to the LB as
// we later check for these error codes when investigating pick results in
// GrpcUtil.getTransportFromPickResult().
onError(configOrError.getError());
return configOrError.getError();
} else {
effectiveServiceConfig = lastServiceConfig;
}
} else {
effectiveServiceConfig = EMPTY_SERVICE_CONFIG;
realChannel.updateConfigSelector(null);
}

If serverConfigError != null then we go through onError() which triggers realChannel.onConfigError() just a few lines up in this method. onConfigError() however doesn't check defaultServiceConfig:

void onConfigError() {
if (configSelector.get() == INITIAL_PENDING_SELECTOR) {
updateConfigSelector(null);
}
}

So I think the problem is "addresses + serviceConfigError" handles the default properly. But "address resolution error" ignores the defaultServiceConfig in RealChannel.onConfigError(). Looks like we should change onConfigError(), which is already doing proper checking to see if we've applied the service config already.

Copy link
Author

@vinodhabib vinodhabib Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejona86 @kannanjgithub i understand 2 below points from your above comments, please confirm/suggest the same.

Point 1 : you mentioned in the below point the check defaultServiceConfig != null shouldn't be relevant as it will have default value so this check is not required here?
if (lastServiceConfig == EMPTY_SERVICE_CONFIG && defaultServiceConfig != null)

Point 2 : "address resolution error" ignores the defaultServiceConfig in RealChannel.onConfigError() ? needs to updated as below or need to add one more if condition to check this? not clear on this point?

void onConfigError() {
  if (configSelector.get() == INITIAL_PENDING_SELECTOR && defaultServiceConfig == null ) {
    updateConfigSelector(null);
  }
}

Copy link
Contributor

@kannanjgithub kannanjgithub Aug 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we already have this handling in NameResolverListener::onError (for address resolution errors) that ends up updating the config selector but with null.

NameResolverListener::onError -> NameResolverListener::handleErrorInSyncContext -> realChannel.onConfigError ->

  if (configSelector.get() == INITIAL_PENDING_SELECTOR) {
    updateConfigSelector(null);
  }

So the call to update the config selector already exists (that we missed noticing), only it uses null always and doesn't use the defaultServiceConfig. If we just change here to

 updateConfigSelector(defaultServiceConfig); 

it would do. The check if (configSelector.get() == INITIAL_PENDING_SELECTOR) also already makes sure that the policy is only applied once, not on repeat invocations.

This means we didn't have to change anything in handleErrorInSyncContext. The other point was about the checks in that change for EMPTY_SERVICE_CONFIG was not correct but this value also gets assigned during a successful name resolution but with empty config, so our use of this value to mean that the config was never applied before was not correct. We need not continue discussion on this point since it turns out the code change in this method was not necessary to do.

Copy link
Author

@vinodhabib vinodhabib Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kannanjgithub As discussed in the call changes are in progress and let you know once its ready for review

Copy link
Author

@vinodhabib vinodhabib Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kannanjgithub code changes are done for the recent review points. please have a look once.
also updated the UT as per my current understanding which is covered as part of onError() flow as I don't see in the existing cases where they are covering the realChannel.onConfigError() directly as its private class.

Copy link
Contributor

@kannanjgithub kannanjgithub Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is ManagedChannelImpl::getConfigSelector() method that exposes its real channel's config selector. We should use this method and assert that it is same as the default service config. See example uses of this method in tests here.
Asserting for the config selector that gets set is important, asserting for channel trace events is secondary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kannanjgithub Review point fixed and Request you have a look once.
unable to cover line lastServiceConfig = defaultServiceConfig as its final class so unable to mock this behavior also I have seen this in the existing cases as no where directly called this its covered as part of onError() flow.

@kannanjgithub kannanjgithub dismissed their stale review September 4, 2024 09:40

Earlier set of changes was not the correct way to do as there was a call to onConfigError already in place.

assertEquals(channel.getConfigSelector().getClass().getName(),
managedChannelServiceConfig.getDefaultConfigSelector().getClass().getName());
assertEquals(channel.getLastServiceConfig().toString(),
managedChannelServiceConfig.toString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Said not to create a method to expose the last service config. Last service config is very much an implementation detail and should not be exposed.


nameResolverFactory.resolvers.get(0).listener.onError(resolutionError);

assertEquals(channel.getConfigSelector().getClass().getName(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is not needed, class names are set at compile time and it is impossible for them to change at runtime, so asserting them achieves nothing.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

channelBuilder.defaultServiceConfig(defaultServiceConfig);
ManagedChannelServiceConfig managedChannelServiceConfig =
createManagedChannelServiceConfig(defaultServiceConfig, null);
nameResolverFactory.nextConfigOrError.set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of setting nextConfigOrError? I don't see it being used in the course of the test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will give the info like we are getting the defaultServiceConfig details which are internally setting to through ManagedChannelServiceConfig and setting this into nameResolverFactory which we are using below.

defaultServiceConfig -> ManagedChannelServiceConfig -> nameResolverFactory.nextConfigOrError.

do you thinks its not required? because I have seen some existing references similar to this in existing UT's in the same class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If existing UTs are setting it they are using it. I don't see how your test uses it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and Removed this setup/config step as it's not harming test flow.

channelBuilder.defaultServiceConfig(defaultServiceConfig);
ManagedChannelServiceConfig managedChannelServiceConfig =
createManagedChannelServiceConfig(defaultServiceConfig, null);
nameResolverFactory.nextConfigOrError.set(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here, doesn't seem to get used .

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will give the info like we are getting the defaultServiceConfig details which are internally setting to through ManagedChannelServiceConfig and setting this into nameResolverFactory which we are using below.

defaultServiceConfig -> ManagedChannelServiceConfig -> nameResolverFactory.nextConfigOrError.

do you thinks its not required? because I have seen some existing references similar to this in existing UT's in the same class.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and Removed this setup/config step as it's not harming test flow.


// initial service config is already applied so it's not reapplied using the default service
// config here.
assertThat(getStats(channel).channelTrace.events).hasSize(prevSize + 2);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this assertion for prevSize + 2 fit in with the above comment? The comment seems to imply that the count should not have increased.

Copy link
Author

@vinodhabib vinodhabib Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to delete this comment as it was added as part of previous implementation. I will delete it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants