2019-09-22

FeedCacheService.IsRepopulationNeeded

I've noticed on my dev farm error 6398 in the application event log which happens only periodically:
The Execute method of job definition Microsoft.Office.Server.UserProfiles.LMTRepopulationJob (ID 59afb507-292a-40d1-97c3-0a038c9bf0e1) threw an exception. More information is included below.

Unexpected exception in FeedCacheService.IsRepopulationNeeded: Cache cluster is down, restart the cache cluster and Retry.

In ULS I found more details about the error:
Unexpected error occurred in method 'GetObject' , usage 'FeedCache' - Exception 'Microsoft.ApplicationServer.Caching.DataCacheException: ErrorCode<ERRCA0017>:SubStatus<ES0006>:There is a temporary failure. Please retry later. (One or more specified cache servers are unavailable, which could be caused by busy network or servers. For on-premises cache clusters, also verify the following conditions. Ensure that security permission has been granted for this client account, and check that the AppFabric Caching Service is allowed through the firewall on all cache hosts. Also the MaxBufferSize on the server must be greater than or equal to the serialized object size sent from the client.) ---> System.ServiceModel.CommunicationException: The socket was aborted because an asynchronous receive from the socket did not complete within the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout. ---> System.IO.IOException: The write operation failed, see inner exception. ---> System.ServiceModel.CommunicationException: The socket was aborted because an asynchronous receive from the socket did not complete within the allotted timeout of 00:01:00. The time allotted to this operation may have been a portion of a longer timeout. ---> System.ObjectDisposedException: Cannot access a disposed object.  Object name: 'System.Net.Sockets.Socket'.    
 at System.Net.Sockets.Socket.Send(Byte[] buffer, Int32 offset, Int32 size, SocketFlags socketFlags, SocketError& errorCode)    
 at System.ServiceModel.Channels.SocketConnection.Write(Byte[] buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout)     -
 -- End of inner exception stack trace ---    
 at System.ServiceModel.Channels.SocketConnection.Write(Byte[] buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout)    
 at System.ServiceModel.Channels.BufferedConnection.WriteNow(Byte[] buffer, Int32 offset, Int32 size, TimeSpan timeout, BufferManager bufferManager)    
 at System.ServiceModel.Channels.BufferedConnection.Write(Byte[] buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout)    
 at System.ServiceModel.Channels.ConnectionStream.Write(Byte[] buffer, Int32 offset, Int32 count)    
 at System.Net.Security.NegotiateStream.StartWriting(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)    
 at System.Net.Security.NegotiateStream.ProcessWrite(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)     -
 -- End of inner exception stack trace ---    
 at System.Net.Security.NegotiateStream.ProcessWrite(Byte[] buffer, Int32 offset, Int32 count, AsyncProtocolRequest asyncRequest)    
 at System.Net.Security.NegotiateStream.Write(Byte[] buffer, Int32 offset, Int32 count)    
 at System.ServiceModel.Channels.StreamConnection.Write(Byte[] buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout)     -
 -- End of inner exception stack trace ---    
 at System.ServiceModel.Channels.StreamConnection.Write(Byte[] buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout)    
 at System.ServiceModel.Channels.StreamConnection.Write(Byte[] buffer, Int32 offset, Int32 size, Boolean immediate, TimeSpan timeout, BufferManager bufferManager)    
 at System.ServiceModel.Channels.FramingDuplexSessionChannel.OnSendCore(Message message, TimeSpan timeout)    
 at System.ServiceModel.Channels.TransportDuplexSessionChannel.OnSend(Message message, TimeSpan timeout)    
 at System.ServiceModel.Channels.OutputChannel.Send(Message message, TimeSpan timeout)    
 at Microsoft.ApplicationServer.Caching.CacheResolverChannel.Send(Message message, TimeSpan timeout)    
 at Microsoft.ApplicationServer.Caching.WcfClientChannel.SendOnChannel(EndpointID endpoint, TimeSpan& timeout, WaitCallback callback, Object state, Boolean async, IDuplexSessionChannel channel, Message message)     -
 -- End of inner exception stack trace ---    
 at Microsoft.ApplicationServer.Caching.DataCache.ThrowException(ResponseBody respBody, RequestBody reqBody)    
 at Microsoft.ApplicationServer.Caching.DataCache.InternalGet(String key, DataCacheItemVersion& version, String region, IMonitoringListener listener)    
 at Microsoft.ApplicationServer.Caching.DataCache.<>c__DisplayClass51.<Get>b__50()    

 at Microsoft.Office.Server.DistributedCaching.SPDistributedCache.GetObject(String key, String regionName)'.

As per the post on Premier Field Engineering Developer Blog, this is a generic error. In order to get additional details, logging should be enabled on Distributed cache service's config file (for me it was C:\Program Files\AppFabric 1.1 for Windows Server\DistributedCacheService.exe.config):
<system.diagnostics>
    <sources>
      <source name="System.ServiceModel"
              switchValue="Information, ActivityTracing"
              propagateActivity="true">
        <listeners>
          <add name="traceListener"
              type="System.Diagnostics.XmlWriterTraceListener"
              initializeData= "c:\Temp\DistributedCacheService.svclog" />
        </listeners>
      </source>
    </sources>
    <trace autoflush="true" />
  </system.diagnostics>

I restarted the Distributed cache service and now I am waiting for another instance of the error to happen so I can get the details of the actual exception.

EDIT: Nothing useful found in trace log. I've enabled verbose log for SharePoint Server > Distributed Cache.

EDIT2: I've enabled admin and debug event logs for distributed cache service.

2019-09-18

An SPPerformanceCounter was not properly disposed

I came across an unexpected error in ULS on my SharePoint 2013 development farm:
An SPPerformanceCounter was not properly disposed.  This could cause excessive memory use.

I've noticed it happens immediately after Health Analysis Job (Hourly, Distributed Cache, All Servers) timer job has executed. In the ULS I also saw that the following rules are being executed by aforementioned timer job:

  • Microsoft.SharePoint.Administration.Health.SPDistributedCacheHostDown
  • Microsoft.SharePoint.Administration.Health.SPDistributedCacheHostFirewallSettingsIncorrect
  • Microsoft.SharePoint.Administration.Health.SPDistributedCacheObjectEvicted
  • Microsoft.SharePoint.Administration.Health.SPDistributedCacheHostApproachingThrottling
I've reflected Microsoft.SharePoint.Health.dll in order to map rules to the descriptions in CA. Each of the classes mentioned above has a property named Summary. In the getter of the property there's a key to resource, such as HealthRule_Summary_SPDistributedCacheObjectEvicted for SPDistributedCacheObjectEvicted class. Next I reflected Microsoft.SharePoint.intl.dll and searched for the keys in Microsoft.SharePoint.resources of the assembly. This is how mapping between rules (classes) and descriptions looks like:

  • Microsoft.SharePoint.Administration.Health.SPDistributedCacheHostDown - One of the cache hosts in the cluster is down.
  • Microsoft.SharePoint.Administration.Health.SPDistributedCacheHostFirewallSettingsIncorrect - Firewall client settings on the cache host are incorrect.
  • Microsoft.SharePoint.Administration.Health.SPDistributedCacheObjectEvicted - Cached objects have been evicted.
  • Microsoft.SharePoint.Administration.Health.SPDistributedCacheHostApproachingThrottling - The Current server is running low on memory.
Then I opened Central Administration (Monitoring > Review rule definitions) and ran the rules one by one while monitoring ULS real time. I found the offending rule in Microsoft.SharePoint.Administration.Health.SPDistributedCacheObjectEvicted

Piece of code which is causing the issue is located in Microsoft.SharePoint.dll assembly, namely by the method Microsoft.SharePoint.DistributedCaching.Utilities.SPDistributedCacheHealthRulesHelper.GetTotalObjectsEvicted(). Line 149 is causing the error:
num = (uint) new SPPerformanceCounter("AppFabric Caching:Host", "Total Evicted Objects").NextValue();

My development farm is on SP1 level. Since I can't upgrade the farm a newer version, because production farm is on that level, I had only two options to resolve the issue:
  1. Treat it as a false positive, i.e. ignore the error.
  2. Disable the rule in CA.
I chose option two. I disabled the rule in CA and ran the timer job again. There were no unexpected errors in ULS any more.

Error Event with ID=3

This morning I was greeted with the error in Application log of Event Log on my dev environment:
WebHost failed to process a request.
 Sender Information: System.ServiceModel.Activation.HostedHttpRequestAsyncResult/32074454
 Exception: System.ServiceModel.ServiceActivationException: The service '/_vti_bin/ServiceName/ServiceName.svc' cannot be activated due to an exception during compilation.  The exception message is: The type 'ServiceName', provided as the Service attribute value in the ServiceHost directive, or provided in the configuration element system.serviceModel/serviceHostingEnvironment/serviceActivations could not be found.. ---> System.InvalidOperationException: The type 'ServiceName', provided as the Service attribute value in the ServiceHost directive, or provided in the configuration element system.serviceModel/serviceHostingEnvironment/serviceActivations could not be found.
   at System.ServiceModel.Activation.ServiceHostFactory.CreateServiceHost(String constructorString, Uri[] baseAddresses)
   at System.ServiceModel.ServiceHostingEnvironment.HostingManager.CreateService(String normalizedVirtualPath, EventTraceActivity eventTraceActivity)
   at System.ServiceModel.ServiceHostingEnvironment.HostingManager.ActivateService(ServiceActivationInfo serviceActivationInfo, EventTraceActivity eventTraceActivity)
   at System.ServiceModel.ServiceHostingEnvironment.HostingManager.EnsureServiceAvailable(String normalizedVirtualPath, EventTraceActivity eventTraceActivity)
   --- End of inner exception stack trace ---
   at System.Runtime.AsyncResult.End[TAsyncResult](IAsyncResult result)
   at System.ServiceModel.Activation.HostedHttpRequestAsyncResult.End(IAsyncResult result)
 Process Name: w3wp
 Process ID: 17160

I double checked to see whether the type in question really is present in the assembly. I also checked to see whether this assembly has been loaded by w3wp.exe process. Nothing pointed to anything unusual.

Then I went to check IIS log and there I found out that the request returned HTTP status code 401. Turns out that during the night worker process recycled by schedule and I left the SharePoint page opened in my browser. SharePoint page contains a piece of javascript code which periodically calls WCF service deployed in _vti_bin folder. Since browser is not on the same domain as dev environment it prompted for password. There was no one there during the night to enter the password so server returned 401. Once I logged in errors went away. False positive.

2019-09-08

Error processing policy updates

I've seen error event with Event ID 7997 in the Event log. Task category is Information Policy Management.
Error processing policy updates for site http://site for list listname.
Error: System.NullReferenceException: Object reference not set to an instance of an object.
at Microsoft.Office.RecordsManagement.InformationPolicy.ListPolicySettings.get_UseListPolicy() at Microsoft.Office.RecordsManagement.InformationPolicy.Policy.<>c__DisplayClass5.<>c__DisplayClass7.b__1(SPList list) at Microsoft.Office.Server.Utilities.ContentIterator.ProcessLists(SPListCollection lists, ListProcessor listProcessor, ListProcessorErrorCallout errorCallout)
List listname is a part of custom feature.
Initial analysis points to Information management policy timer job.
Inner exception is Arithmetic overflow error converting expression to data type int. SQL Server stored procedure: proc_GetDocsMetaInfo.
Upon further digging I found that store procedure causing the issue was dbo.proc_GetDocsMetaInfoSingle. In particular it was this part: Lists.tp_BaseType + Lists.tp_ServerTemplate * 256. This caused arithmetic overflow because Type for ListTemplate was set to a value greater than 10,000,000. Once I reduced the value error was not present any more.

This must be obeyed:
BaseType + Type * 256 < 2^31 - 1

Failure retrieving Diagnostics Logs location

I've been getting these Unexpected ULS entries from Timer Job hourly-all-sptimerservice-health-analysis-job:
Failure retrieving Diagnostics Logs location:
System.ArgumentException: Object must be a root directory ("C:\") or a drive letter ("C").
at System.IO.DriveInfo..ctor(String driveName)
at Microsoft.SharePoint.Administration.Health.AppServerDrivesAreNearlyFull. GetDrivesForValidation()
Turns out I had to change Central Administration > Monitoring > Configure diagnostic logging > Path from %CommonProgramFiles%\Microsoft Shared\Web Server Extensions\15\LOGS to C:\Program Files\Common Files\microsoft shared\Web Server Extensions\15\LOGS.

Error is caused by Health Analysis Job (Hourly, Microsoft SharePoint Foundation Timer, All Servers) timer job. Specifically DriveInfo constructor cannot resolve environment variables, so log location has to be a real path.

Alternative Powershell script solution:
$diagSvc = [Microsoft.SharePoint.Administration.SPDiagnosticsService]::Local
$commonProgramFiles = Get-ChildItem -Path Env:\ | where Name -EQ "CommonProgramFiles"
$newLogLocation = $diagSvc.LogLocation.Replace("%CommonProgramFiles%", $commonProgramFiles[0].Value)
$diagSvc.LogLocation = $newLogLocation

$diagSvc.Update()

2019-09-03

Event Log Warning Event ID 8088 - Taxonomy

I experienced a lone Warning event in the Application Event log with the ID 8088 after server restart. Text of the event was 'The Managed Metadata Service 'Managed Metadata Service Application' is inaccessible.'.

I dug deeper in ULS and found this log entry:
Failed to get term store for proxy 'Managed Metadata Service Application'. Exception: System.TimeoutException: The request channel timed out while waiting for a reply after 00:00:09.9375192. Increase the timeout value passed to the call to Request or increase the SendTimeout value on the Binding. The time allotted to this operation may have been a portion of a longer timeout. ---> System.TimeoutException: The HTTP request to 'http://sp2:32843/ee1f85842acc4eb1b543bb2b1395980c/MetadataWebService.svc' has exceeded the allotted timeout of 00:00:09.9990000. The time allotted to this operation may have been a portion of a longer timeout. ---> System.Net.WebException: The operation has timed out   
 at System.Net.HttpWebRequest.GetResponse()   
 at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)     -
 -- End of inner exception stack trace ---   
 at System.ServiceModel.Channels.HttpChannelUtilities.ProcessGetResponseWebException(WebException webException, HttpWebRequest request, HttpAbortReason abortReason)   
 at System.ServiceModel.Channels.HttpChannelFactory`1.HttpRequestChannel.HttpChannelRequest.WaitForReply(TimeSpan timeout)   
 at System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout)     -
 -- End of inner exception stack trace ---    Server stack trace:   
 at System.ServiceModel.Channels.RequestChannel.Request(Message message, TimeSpan timeout)   
 at System.ServiceModel.Channels.SecurityChannelFactory`1.SecurityRequestChannel.Request(Message message, TimeSpan timeout)   
 at System.ServiceModel.Dispatcher.RequestChannelBinder.Request(Message message, TimeSpan timeout)   
 at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout)   
 at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation)   
 at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message)    Exception rethrown
 at [0]:   
 at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg)   
 at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type)   
 at Microsoft.SharePoint.Taxonomy.Internal.IDataAccessReadOnly.GetSessionData(Guid& termStoreId, Guid rawPartitionId, Int32 lcid, String systemGroupName, String systemGroupDescription, String keywordsTermsetName, String keywordsTermsetDescription, String orphanedTermsTermsetName, String orphaneedTermsTermsetDescription, String hashtagsTermsetName, String hashtagsTermsetDescription)   
 at Microsoft.SharePoint.Taxonomy.Internal.TaxonomyProxyAccess.<>c__DisplayClass4.<GetSessionData>b__3(IMetadataWebServiceApplication serviceApplication)   
 at Microsoft.SharePoint.Taxonomy.MetadataWebServiceApplicationProxy.<>c__DisplayClass38.<RunOnChannel>b__36()   
 at Microsoft.Office.Server.Security.SecurityContext.RunAsProcess(CodeToRunElevated secureCode)   
 at Microsoft.SharePoint.Taxonomy.MetadataWebServiceApplicationProxy.RunOnChannel(CodeToRun codeToRun, Double operationTimeoutFactor)   
 at Microsoft.SharePoint.Taxonomy.Internal.TaxonomyProxyAccess.GetSessionData(Guid& termStoreId, Guid rawPartitionId, Int32 lcid, String systemGroupName, String systemGroupDescription, String keywordsTermsetName, String keywordsTermsetDescription, String orphanedTermsTermsetName, String orphanedTermsTermsetDescription, String hashtagsTermsetName, String hashtagsTermsetDescription)   
 at Microsoft.SharePoint.Taxonomy.Internal.DataAccessManager.GetSessionData(Guid& termStoreId)    
Source of the log entry was Timer Job Query Classification Dictionary Update for Search Application.

So a web service call to Managed Metadata web service timed out after roughly 10 seconds. After a bit of digging I found out that the timer job uses C:\Program Files\Microsoft Office Servers\15.0\WebClients\Metadata\client.config for the client endpoint configuration. There I saw that both http and https bindings were set up to have timeouts set to 30 seconds. This puzzled me so I reflected the code in question. Turns out that timer job is using operationTimeoutFactor which was set to 0.333333 in my case. It multiplies the factor with timeout setting in the configuration file to get the real timeout. That's why web service call timed out after 30 * 0.333333 = 9.999 seconds.

I just increased all of the timeouts to 1:30 minutes and restarted to SharePoint Timer service. Warnings in the Event Log didn't appear any more.

C:\Program Files\Microsoft Office Servers\15.0\WebClients\Metadata\client.config after modification:
<?xml version="1.0" encoding="utf-8" ?> <configuration> <system.serviceModel> <client> <endpoint name="http" contract="Microsoft.SharePoint.Taxonomy.IMetadataWebServiceApplication" binding="customBinding" bindingConfiguration="MetadataWebServiceHttpBinding" /> <endpoint name="https" contract="Microsoft.SharePoint.Taxonomy.IMetadataWebServiceApplication" binding="customBinding" bindingConfiguration="MetadataWebServiceHttpsBinding" /> </client> <bindings> <customBinding> <binding name="MetadataWebServiceHttpBinding" receiveTimeout="00:01:30" sendTimeout="00:01:30" openTimeout="00:01:30" closeTimeout="00:01:30"> <security authenticationMode="IssuedTokenOverTransport" allowInsecureTransport="true" /> <binaryMessageEncoding> <readerQuotas maxStringContentLength="2147483647" maxArrayLength="2147483647" maxBytesPerRead="2147483647" /> </binaryMessageEncoding> <httpTransport transferMode="StreamedResponse" maxReceivedMessageSize="2147483647" authenticationScheme="Anonymous" useDefaultWebProxy="false" /> </binding> <binding name="MetadataWebServiceHttpsBinding" receiveTimeout="00:01:30" sendTimeout="00:01:30" openTimeout="00:01:30" closeTimeout="00:01:30"> <security authenticationMode="IssuedTokenOverTransport" /> <binaryMessageEncoding> <readerQuotas maxStringContentLength="2147483647" maxArrayLength="2147483647" maxBytesPerRead="2147483647" /> </binaryMessageEncoding> <httpsTransport transferMode="StreamedResponse" maxReceivedMessageSize="2147483647" authenticationScheme="Anonymous" useDefaultWebProxy="false" /> </binding> </customBinding> </bindings> </system.serviceModel> <system.net> <connectionManagement> <add address="*" maxconnection="10000" /> </connectionManagement> </system.net> </configuration>