Dell Windows Server 2012 Foundation Rok Download Chrome
Hi There, Massive shot in the dark here but I am struggling with a pretty major issue atm. We have a production file server that is hosted on the following: Dell MD 3220i ->iSCSI ->Server 2008 R2 Hyper-v Cluster ->Passthrough Disk ->Server 2012 R2 File Server VM Essentially 3 times now, roughly a month or so apart. The file server stops accepting connections.
During this time, the server is perfectly accessible through rdp or with a simple ping. I can browse the files on the server directly but no-one appears to be able to access the shares over SMB. A reboot of the server fixes the issue. As per a KB article I removed nod antivirus from the server to rule out a conflicting filter mode driver after the second fault. Sadly yesterday it happened again. The only relevant errors in the servers log files are: SMB Server Event ID 551 SMB Session Authentication Failure Client Name: 192.168.105.79 Client Address: 192.168.14 User Name: HHS H6-08$ Session ID: 0xFFFFFFFFFFFFFFFF Status: Insufficient server resources exist to complete the request. (0xC0000205) Guidance: You should expect this error when attempting to connect to shares using incorrect credentials.
Windows server 2012 foundation dell. Windows Server 2012 Foundation is supported by an extensive network of. You can also download Windows Server 2012 R2.
This error does not always indicate a problem with authorization, but mainly authentication. It is more common with non-Windows clients.
This error can occur when using incorrect usernames and passwords with NTLM, mismatched LmCompatibility settings between client and server, duplicate Kerberos service principal names, incorrect Kerberos ticket-granting service tickets, or Guest accounts without Guest access enabled and SMB Server event ID 1020 File system operation has taken longer than expected. Client Name: 192.168.105.97 Client Address: 192.168.11 User Name: HHS 12J.Champion Session ID: 0x2C07B40004A5 Share Name: * Subjects File Name: Command: 5 Duration (in milliseconds): 176784 Warning Threshold (in milliseconds): 120000 Guidance: The underlying file system has taken too long to respond to an operation. This typically indicates a problem with the storage and not SMB. I have checked the underlying disk/iscsi/network hyper-v cluster for any other errors or issues, but as far as I can tell everything is fine. Is it possible that something else is left over from the NOD antivirus installation?
Looking for suggestions on how to troubleshoot this further. Hi There, Thanks for the quick lesson on iSCSI best practice. As stated I have already checked the underlying storage/networking/iscsi/mpio etc. And there are no problems at all. The same iSCSI/cluster has been running production vm's for 4 years now without any issues. I find it weird that when the SMB service manages to get locked up like this, I can still browse the files fine on the server.
That would rule out any underlying physical storage issue surely? One theory I had could be perhaps the use of an iSCSI passthrough disk in the 2008 R2 host to the 2012 R2 guest.
This is the only thing unique to this VM, all other guest vm's use vhd files on CSV's. Thanks for the responses.
I will try restarting the Server service next time it occurs, sadly as this only happens one a month or so it may be a while until the condition occurs again. Just for some extra background on the server and its setup: • The pass-through disk is configured as a single ntfs volume at around 9TB in size. • The volume is presented as a drive letter and then each share (around 8-10 of them) is a subfolder on the disk.
• The volume does have de-duplication enabled. Its currently 6TB deduped down to 3TB. • The server was upgraded from Server 2012 to Server 2012 R2 before being deployed in production. As a long term solution, it may just be a case of building a fresh server to move over to. I'll message again next time it happens. Hi Dan, I'd be very interested if you have found a solution to this as it sounds like we are having the exact same problem with 2 of our file servers. Both are server 2012 (haven't upgraded them to R2 yet) and are virtual machines accessing storage with a pass through fibre connection.
Like yourself when the problem occurs the servers are completely responsive, pingable and can be connected to on RDP where I can access the storage directly with no problem. One of the servers has an application providing AFP support for our apple macs and the macs continued to access their home directories when this happened so it's definitely not storage related, it's only clients that are connecting via SMB that are affected. Also a quick server reboot fixes the problem, I will try restarting the server service when it next happens. Annoyingly there is nothing in the logs around the time that this happens so the error messages you've posted may not be related to the problem?! Hello All, I too am experiencing this issue on the latest and 'greatest' windows server OS. I have tested this on 2012 and 2012 R2 and experienced the issue on both builds. I am running the servers on hyper-v 2012 r2 and have sr-iov enabled on the server nics to rule out the microsoft hyper-v networking stack (although this did occur with the vmq enabled nics too).
Today I made one change and I will see if it helps. I removed any hidden nic cards from device manager. Please keep me posted if you make any progress on resolving this issue on your servers. Thank you, Fred. It happened again! So this is the fourth confirmed case now.
Being a little more clued up I observed the following this time: • Random clients we're disconnected or could not connect. Others were still connected fine. • No errors we're being logged in the event log. • No storage or cluster errors were apparent. • Tried restarting the server service. It failed to restart and just hung at 'stopping'. After telling it to stop, a lot of new messages were logged.
• Being in production I had to restart the server to get our files working again. Much as I would love to pour over it and troubleshoot for a few hours my phone wouldn't stop ringing. The new error message: Event ID 2012 - Source: srv While transmitting or receiving data, the server encountered a network error. Occasional errors are expected, but large amounts of these indicate a possible error in your network configuration. The error status code is contained within the returned data (formatted as Words) and may point you towards the problem.
Greetings, Finally found this thread after weeks of searching. I am having the same or similar issue. Granted mine is just an enthusiast home setup, but here's what I'm seeing: • Originally was running Hyper-V Server R2 • One of the guest OSs (also Win2k12 R2) was a file server with a pass-through 15TB array on an Areca 1280ML • The host VM disk is formatted NTFS, and the 15TB passthrough volume is ReFS. • All Intel NICs, though at this point they are probably 2-5 years old. • Supermicro X8STE with Xeon W3520 AND 24gb non-ecc RAM.
• Netgear GS748Tv3 Switch • Many configurations of NIC Teaming, and even straight host serving of the files through single NIC. Over the last several months I have ruled out everything I can think of, except for the server.
Since it's really only me using the servers, I'll mostly notice it when streaming content. I'll get a win32 I/O #59 error, suggesting a network failure.
It happens sporadically, but usually once an hour, but on occasion I won't see the issue for many hours. Then, in Event Viewer, I see the 551 error described above: SMB Session Authentication Failure Client Name: 192.168.0.11 Client Address: 192.168.0.11:50758 User Name: Session ID: 0x15 Status: The attempted logon is invalid. This is either due to a bad username or authentication information. (0xC000006D) Guidance: You should expect this error when attempting to connect to shares using incorrect credentials. This error does not always indicate a problem with authorization, but mainly authentication. It is more common with non-Windows clients. This error can occur when using incorrect usernames and passwords with NTLM, mismatched LmCompatibility settings between client and server, duplicate Kerberos service principal names, incorrect Kerberos ticket-granting service tickets, or Guest accounts without Guest access enabled Things I've tried which seem to suggest an issue with the OS: • Wired up the server to a separate (cheap) switch directly with my client.
Problem was reproduced. • Reconfigured NIC teaming in every combination available, including disabling it. Problem was reproduced. • Copied over a large library of streaming content to a Windows Standard R2 guest OS that is being hosted on ESX 5.5. Problem was NOT reproduced after 24 hours of testing (suggesting everything works fine when the host OS is not Windows). The other box is a very similar setup hardware wise, minus the large storage. The other box is also connected to the same Netgear switch.
I also tried streaming music content from the guest file server TO a guest Windows 8.1 client both hosted on the same box. Problem was reproduced (I was very surprised by this since my understanding is that it's the virtual switch that would have been doing the talking between the two). I've read articles about how some of my NICs (like the 82574L), while supported in-box, have been found to have issues and can no longer have drivers written for them because of updated WHQL standards. But my test which reproduced the error on the virtual switch seems to disprove any relationship to the physical NICs. It's truly to the point where I'm considering moving this machine to ESX as well.
However, I'd really prefer to stick with what I've got, as I'm tired of working on it. I'll be bookmarking this an will be MORE THAN HAPPY to provide any additional details the community might need. Thank you for your time. Well, I loaded up the host with ESX 5.5 and installed my 2012 R2 file server as a guest on it.
Configured it all the same as when it was a guest on Hyper-V. Same problem. At this point it sure seems like something in Windows 2012 and above, perhaps with SMB. I haven't tested with 2008 R2 yet. I might try that next, but it was a heck of a lot of work just to get this far and I'm spent.
SMB Session Authentication Failure Client Name: 192.168.0.180 Client Address: 192.168.0.180:55373 User Name: Session ID: 0xC Status: The attempted logon is invalid. This is either due to a bad username or authentication information. (0xC000006D) Guidance: You should expect this error when attempting to connect to shares using incorrect credentials. This error does not always indicate a problem with authorization, but mainly authentication. It is more common with non-Windows clients. This error can occur when using incorrect usernames and passwords with NTLM, mismatched LmCompatibility settings between client and server, duplicate Kerberos service principal names, incorrect Kerberos ticket-granting service tickets, or Guest accounts without Guest access enabled.
Yesterday I rebuilt the server with all new hardware. Well, new to me. Dual XEON, LGA771 on a Supermicro board, all Intel NICs, and fully buffered ECC Kingstom RAM. Loaded it with ESX 5.5 and the same Windows file server guest (Windows 2012 R2 Datacenter Eval). Having ruled out every piece of hardware on my network. I guess I'm left with the possibility that there's some sort of authentication problem between the file server and the domain? The Event Log message seems to suggest that, at least, and this problem doesn't exist while transferring or streaming files from the virtual guest DC on a different host.
Not sure what my next step is. I'm debating converting the guest file server to Ubuntu, which would at least prove or disprove the problem is isolated to the Windows guest. I hope this is being somewhat helpful and that I'm not just having a conversation with myself:-). Here's what I found over the last 24 hours. Economics By Michael Parkin Pdf To Excel. As I mentioned, I rebuilt the server with all new hardware.
Which at this point totally eliminates hardware issues at all levels. I tested again with 2k12 R2, same issue.
I reverted to 2k8 R2 today, and. So now I'm beyond hardware issues and probably beyond 'Windows' issues. I haven't tried converting the server to Ubuntu yet, but I think my Win2k8 R2 test told me that the problem lies somewhere in configurations (and that the problem isn't related to SMB 3.02). So, since streaming works perfectly fine when test media is located directly on a domain controller, and since Event Viewer entries suggest authentication issues in SMB, I began looking into reasons why any sort of authentication/domain chatter might fail. I have all my servers virtualized, including my domain controllers. And currently it's all on ESX 5.5, as a result of troubleshooting this problem.
I looked into the network configuration of the primary domain controller, the one I was able to successfully stream from. For its network I had it connected to the same virtual switch as the other guests, which has several NICS teamed together. So then I ran across this. Which suggests that NIC teaming for Domain Controllers is a no-no. So, I carved off two vmnics to a new virtual switch, set one as active and one as standby, and am beginning to test again now. I'll let you know how it goes:-). Well, I think I figured it out.
Like most problems that take weeks to figure out, the solution appears to have been pretty simple. I noticed a bunch of audit failures in the event viewer. And all were related to SMB sharing.
More research took me here. The commands the guy recommended didn't really do anything for me, but I took the premise of the server's 'channel' with the domain controller being corrupted and ran with it. It made sense considering the number of times I've joined and unjoined the file server to the domain. So I unjoined it, renamed it to something that's never been on the domain before, rejoined it. And for the last 2-3 hours I've been able to stream media without any interruptions or errors. I'll let things run all night to be sure, but I think it fixed it.
So, Dan Kingdon, check out that link. If his commands don't tell you anything, maybe look into removing the server from the domain, renaming it, then rejoining it. If that's not feasible in your situation, maybe there's a way to fix the 'channel' without doing all of that.
Hope this has helped. Before going that route I decided to try a couple more things. As it just seemed so unlikely that Active Directory was the issue. Also the many tests I did above, some of which included specific AD diagnostics all indicated that all was well with AD. So I decided to disperse media throughout physical PCs in the house and run a 15 hour test streaming from them all. Not a single error or failure from streams coming from 3 machines, two physical and one virtual on my other server. This test again suggests that the problem is something specific to the problematic server.
Furthermore since all versions of Windows I've tested show the same error, I'm led to believe AGAIN that it's hardware related. Problem is, at this point I've replaced ALL the hardware in that machine. Except for the Areca RAID controllers.
There are two Arecas in the box, a 1280ML and a 1222. So, one from two different generations. The 1280 normally hosts all my data, and the 1222 my backups. I moved media to the 1222, streamed from there for a couple hours, and reproduced the error.
So then, I sat in a quiet corner and thought for about 20 minutes. These two Areca controllers are ALMOST the only thing unique to this machine vs.
All the others. The other server also uses a 1280ML, but it's running VMWare with the VMWare driver. Streaming from virtual machines on ESX aren't accessing the controller directly, which may be why I don't see the problem while streaming from guests that machine. And of course none of the PCs in the house have Arecas.
Given I've had issues with Areca controllers ever since the release of 2012, I'm thinking now that the culprit is the Areca drivers. Looking back through all my tests, they all reproduced errors when the media was hosted on machines which used the Areca drivers, including ESX guest with the Arecas passed through. In one final test for this theory, I loaded up a Ubuntu 12.04 guest on the problematic Hyper-V machine, and passed through the NTFS partition that I normally pass through to Windows.
I shared out the media folder, and again within about an hour, I got the error. The underlying Areca driver on the Hyper-V host is the one thing all the failure tests seem to have in common. I'm going to pick up an Adaptec from eBay tonight and give that a try this week. For what it's worth I too am running into the same exact problem. I've been battling this since late December 2013. • Running Windows 2012 • Hypervisor is ESXi 5.1 • Server service hangs if i try to restart it during the issue • The problem, when it occurs, only affects random clients.
Meaning it continues to work fine for some, but not others. We have multiple wiring closets and the problem is inter mixed across all of them. • I too can RDP / ping to the host from a problem client • Rebooting the server is the only solution that fixes it. • I did a packet capture from the server view and from what i can see it, an SMB negotiate is sent from the client, but the server never responds with with SMB protocol to use. It ends up in a repeating loop until the client gives up.
It's good to know i'm not the only one having this issue. I'm seeing the same issue on Server 2012 R2 VM hosting on Server 2012. We're using Dell EqualLogic iSCSI arrays.
I can reproduce the problem easily with Windows 8.1 workstations and loading roaming profiles off a 2012 R2 file server. The key event log entry seems to be: File system operation has taken longer than expected.
Client Name: [2001:630:] Client Address: [2001:630:]:59115 User Name: DOMAIN 9999 Session ID: 0x3D Share Name: * Roaming Profiles File Name: STUDENTS 9999.V4 NTUSER.DAT Command: 16 Duration (in milliseconds): 77942 Warning Threshold (in milliseconds): 15000 Guidance: The underlying file system has taken too long to respond to an operation. This typically indicates a problem with the storage and not SMB. I've also notied taskmanager show 100% disk utilisation but 0 read/write/response time.
I can still browse the disk locally but I can't copy files or make directories. It seems to completely lock up the disk.
Has anyone opened a PSS case? Can you post case numbers - i'm going to open a case and it might be helpful to link cases. My case number is 11053. I'm seeing the same issue on Server 2012 R2 VM hosting on Server 2012. We're using Dell EqualLogic iSCSI arrays.
I can reproduce the problem easily with Windows 8.1 workstations and loading roaming profiles off a 2012 R2 file server. The key event log entry seems to be: File system operation has taken longer than expected. Client Name: [2001:630:] Client Address: [2001:630:]:59115 User Name: DOMAIN 9999 Session ID: 0x3D Share Name: * Roaming Profiles File Name: STUDENTS 9999.V4 NTUSER.DAT Command: 16 Duration (in milliseconds): 77942 Warning Threshold (in milliseconds): 15000 Guidance: The underlying file system has taken too long to respond to an operation. This typically indicates a problem with the storage and not SMB.
I've also notied taskmanager show 100% disk utilisation but 0 read/write/response time. I can still browse the disk locally but I can't copy files or make directories. It seems to completely lock up the disk.
Has anyone opened a PSS case? Can you post case numbers - i'm going to open a case and it might be helpful to link cases. My case number is 11053 I think that sounds like a different issue than what we're describing.
For us, SMB stops working, but the disk sub-system is fine. I can copy file without issue (when logged into the server). Well, several more weeks and lots more $$ later, I think (for reals) I've found the issue. Though I still can't seem to resolve it. It has never been hardware related.
Looking back I should have suspected that from the start. I attempted streaming/copying continuously from the share directly from the IP address of the server. So, 192.168.0.4. Whenever I do that, there's never a problem.
What I'm seeing now is the problem exists while only streaming from a share by accessing it through either an A record or CNAME. Even directly to the server via the server name fails. I have removed all records of the server, rebooted it and let it re-register itself in DNS, and still the problem persists while accessing via DNS host or aliases.
There are numerous articles like this one describing additional steps to be taken to accommodate sharing via DNS names, but they haven't worked for me either: So anyways, that's where I stand with it. = good to go = no dice.
I'm not sure what this actually showed me, but last night I let a bunch of stuff stream to a Windows 7 PC in the living room using the FQDN share path, and everything ran beautifully all night. On the other hand, on my 8.1 PC (which is where I've been doing the bulk of the troubleshooting), I streamed content all night long accessing the same share by IP address and surprisingly. I received errors all night (share becomes unavailable for a period then re-establishes itself). Every test I do invalidates the last one.
This 8.1 machine streams fine from all other physical and virtual machines in the house. All either running Windows 7 or 2012 R2, and up until last night it streamed from the server by IP without issue. Let me amend my last post. The 8.1 machine was actually accessing the content through a mapped drive (X:), which was MAPPED to. Sitting here this morning with some ZZ Top playing while I work, everything disconnects briefly.
And if I have any explorer windows open to the X: drive this window pops up: It then goes away and everything resumes. Previously when I streamed without issue, I simply opened up in an explorer window and played content from there. I'll try that again now instead of accessing through a mapped drive. No problems yet so far streaming from directly browsing to share, and not from a mapped drive and I'll report back.
FYI, my maps hare handled in Group Policy. And not by login scripts. I did have problems in the past with Group Policy drive mappings in Windows 8+. That problem was fixed by de-selecting 'Reconnect'. As another test, perhaps I'll manually map a drive to and try that. I think we're narrowing it down?
Alright, I have streamed all day successfully when accessing the share directly and NOT through a mapped drive. At least in my scenario, I can pretty confidently say the problem exists only when interacting with a share over mapped network drives.
When interacting with the share directly, like through explorer by entering the UNC, the problem is not reproducible. But, since I WANT to be able to used mapped drives I'm going to test more scenarios. Like manually mapped drives and drives that are mapped by login group policy scripts instead of drive-map policies and see what that shows me. Sure seems like that would be the case. But my tests, as relatively non-technical as they are, do seem to suggest it's at least related to mappings. Last night I streamed continuously throughout the night from the manually mapped drive and got failures and unavailability messages several times.
Tests from directly accessed shares never so far have the issue (and I can reproduce it reliably). Is there some other mechanism involved in talking to shares through maps?
Extra authentication, DNS queries, anything? Either from the client or host? Intentional and periodic disconnects and reconnects?
I reproduced this problem as well with 2012, and even 2008 R2 (see above). In all cases, I was using a 2012 R2 Domain Controller/ DNS server. And in all cases I was using a Windows 8.1 client. So, that test I did all night with the manually mapped X: drive (and all other drives disconnected) failed. But then I noticed something this morning while continuing to stream media from it. The GPOs got ran and therefore re-mapped all the drives I disconnected, presumably including the X: drive. I realized this only because suddenly all the mapped drives I had manually DISconnected reappeared.
Without needing a reboot or re-login. When this GPO was re-applied, at that very moment all the streaming became unavailable and the errors I've been seeing appeared. Bs 1377 Part 3 1990 Pdf Free. So, as of right now, it seems like the drive mapping GPO perhaps gets reapplied periodically. Or something. When this happens any network activity using the mapped drive is interrupted until the new connection is established. To test this, I deleted all my mapped drive GPOs this morning and rebooted my client.
I then manually mapped the X: drive to, and so far for about 4.5 hours there hasn't been one failure. My guess is because the GPO for mapping drives won't ever run. I've been working through this with Microsoft over the last few days - it's proving to be a tricky one to pin down. Those who are experiencing the problem i'd be intrested to know: • Is everyone seeing SMBServer Event ID 1020: File system operation has taken longer than expected?
• What OS are your clients running? • What are your disk counters saying while its happening? (Diskperf -Y ->Taskmgr - >Active Time, Read Speed, Write Speed?
Or performance monitor) • Does restarting the Server service solve the problem? (It does for us, but it takes ages to stop the service, but then again the machine also takes ages to shutdown for the same reason) • Who is seeing SMB negotiation problems?
(Either using Network Monitor 3.4 or powershell Get-SmbConnection. We are seeing Windows 8.1 occasionally incorrectly negotiating SMB3 instead of SMB3.02). I've been working through this with Microsoft over the last few days - it's proving to be a tricky one to pin down.
Those who are experiencing the problem i'd be intrested to know: • Is everyone seeing SMBServer Event ID 1020: File system operation has taken longer than expected? • What OS are your clients running? • What are your disk counters saying while its happening? (Diskperf -Y ->Taskmgr - >Active Time, Read Speed, Write Speed? Or performance monitor) • Does restarting the Server service solve the problem? (It does for us, but it takes ages to stop the service, but then again the machine also takes ages to shutdown for the same reason) • Who is seeing SMB negotiation problems?
(Either using Network Monitor 3.4 or powershell Get-SmbConnection. We are seeing Windows 8.1 occasionally incorrectly negotiating SMB3 instead of SMB3.02) To answer your questions since i think we're seeing the same issue: • No 1020 event in system or the application log • Clients are a mix of windows xp through windows 8. TMK, Windows 7 and Windows 8 are the only ones i've seen with the problem. However we have a very small XP population, so that may not be 100% accurate. • I didn't think to look at the disk counters, but will post the next time i see it (its been a full week with no issues) • We try to restart the server service, but it timesout. I've never waited to see if it would restart and simply rebooted the server.
Every time we've had the problem though, the server service is hung. The reboot its self is actually quick. • We do see SMB negotiate problems and i have a capture of it. Basically the negotiate packet comes in from the client to detect the dialect, and then the server never responds with the dialect. There is TCP communication that's sent to the client. Other info: Server OS = Windows 2012 R1 AV = ESET (Nod32).
Alright, I've streamed successfully all day with a manually mapped drive and no drive mapping GPO applied. I think as far my issue is concerned I'm good to go. If any of you who are experiencing this problem use drive mapping GPOs to map drives, try instead to map drives with login scripts.
I will say that never did I have a server freeze, requiring a reboot. So, I'm not entirely sure my issue ended up being the same as the OP's. Nonetheless, this is what I've found. Hope it helps somebody out there. Hi, I work for a Network Solutions provider and we have now seen this problem on atleast 3 completely seperate customer sites, all using different hardware, but all on 2012.
We built a server to 2012 R2 last week in the hope it had been resolved but the customer has phoned today to say the server stopped serving files to clients and had to be rebooted. The only 100% fix we have found so far is to rebuild the server back to 2008R2 and we have never seen the problem again. We have logged the case with Microsoft and I will update if we get anywhere with it but at the moment most of the blame seems to be on AV (Sophos) although it is fine under 2008R2 and other people with the same problem have tried without AV and the issue still exists. I would be willing to work with anyone/share ideas to try and get this resolved for all of us. Hi, I work for a Network Solutions provider and we have now seen this problem on atleast 3 completely seperate customer sites, all using different hardware, but all on 2012.
We built a server to 2012 R2 last week in the hope it had been resolved but the customer has phoned today to say the server stopped serving files to clients and had to be rebooted. The only 100% fix we have found so far is to rebuild the server back to 2008R2 and we have never seen the problem again. We have logged the case with Microsoft and I will update if we get anywhere with it but at the moment most of the blame seems to be on AV (Sophos) although it is fine under 2008R2 and other people with the same problem have tried without AV and the issue still exists. I would be willing to work with anyone/share ideas to try and get this resolved for all of us. Thanks Keep us posted and let us know if you need any specifics from our environment. Thanks all for you replies. We've just managed to capture the information that Microsoft have been requesting.
Essentially they just wanted a network capture on the server and client at the same time while the problem was occurring; with the client trying to access a UNC on the effected server; and a few other bits thrown in. Eric I think we are seeing exactly what you see: • The client sends an SMB Negotiate request to the server: SMB: C; Negotiate, Dialect = PC NETWORK PROGRAM 1.0, LANMAN1.0, Windows for Workgroups 3.1a, LM1.2X002, LANMAN2.1, NT LM 0.12, SMB 2.002, SMB 2.???
• We can see this being received by the server but it sends no SMB response back. We do see a TCP response on 445 but it's not SMB • The client resends the SMB Negotiate approx. Every 20 seconds due to a lack of response from the server We killed IOMETER (we were using it to stress the file server) and waited another 5 mins and the server recovered and eventually the client got a response to its SMB negotiate request and they negotiated SMB 3.02 correctly. Hi Everyone, I have spoken to another IT guy and he has a number of 2012 servers and has not seen this problem yet (or isn't aware of it) and the only difference we can think of is that he is running Datacenter not Standard.
We have seen the problem on 2012 Standard and 2012R2 Standard, can you all confirm the versions you are using? Also, DJL, are you saying you can reproduce the problem on demand? If so can you google the following article (I can't post a link at the mo) and try changing the timeout value to something lower to see if it stops the problem from occurring?
Microsoft network server: Amount of idle time required before suspending session Thanks Tom. Sure, i'll go into detail about our setup as well Our setup: • 4x Dell PowerEdge R610's (Intel Xeon X5560, 144GB RAM, Broadcom 1Gbps LOM and Intel 10Gbps X540-T2) running Windows Server 2012 Datacenter Core / Failover Cluster / Hyper-V • 3x iSCSI SAN's - 2x Dell EqualLogic PS6000 and 1x PS6110 • The file servers we see the problem on are Windows Server 2012 R2 Datacenter Core. Their system/boot disk are VHDX's on Cluster Shared Volumes and the file data is stored on SCSI Pass-through disks.
• The file servers have the IPv4 stack uninstalled - we run IPv6 only. • All hardware is running the latest firmware/drivers etc • Client workstations are running Windows 7 SP1 and Windows 8.1. All latest updates from Windows Update/WSUS are installed To reproduce the problem we: • Map a share on the server to a workstation.
Run IOMETER on the share to stress the server. IOMETER settings are: 2,000,000 sectors 400 outstanding IO 512B 100% read access specification 4 workers. This takes the disk activity up to 100% • We then logon a number of Windows 8.1 workstations simultaneously - the users roaming profile is stored on the same server/volume. • We normally login to about 40 machines at the same time to make sure the problem happens, but it can happen with a few as 1 or 2 machines. I'd be interested to know what processors you guys are using?
For what it's worth, I'm encountering the same problems at my workplace. Setup: • Intel Xeon E5-2420 • VMWare ESXi 5.5, build 1474528. • Windows 2012 R2 Essentials • 2012 R2 as sole Domain Controller, running DNS, AD, DHCP and file/print sharing • No WSUS set up yet. Problem seems to manifest most often during file saves in Office 2007, but 90% of our document shuffling is spreadsheets, so it would make sense that's where we see it most frequently.
Once a single user starts having the issue, it starts to show up on others. All of our workstations are running XP or Windows 7. Seems to happen most frequently on Windows 7 clients.
Disabling smb2/3 seems to have allowed me to pull an individual workstation out of the stall just by waiting for the network share to display properly again, but it's not a solution. It usually takes a few minutes for it to resolve the share contents properly. It's not a good approach, just a stopgap that allow (eventually) saving open files..tmp files with the file name as a random hex value show up anywhere we've had a workstation stall out during a save. The file itself is basically inaccessible in most cases until the next reboot of the server. Those temp files often can't be addressed, opened, deleted or used in any way without causing explorer to freeze. Once the server is rebooted, I can collect and delete all of those.tmp files, or open them in their respective programs.
(word/excel/etc) Once in a while, the original file is also corrupted and can't be opened/used/saved over/replaced/renamed until a server reboot. Not a lot to add to the discussion, just another instance of it happening. I've been following this thread and and hoping someone comes up with a solution sooner or later. As it currently stands, I wouldn't recommend deploying either 2012 or 2012R2 as a file server in any circumstance.
Works great for everything else, but this pretty much shuts our entire workplace down, sometimes multiple times a day, since our key software has data files hosted on the file share. Setup: AD Domain 2008R2, 2 VM DCs (08r2) 1 Physical (2012) 2 VM FS and 2 Physical. They are set to cluster.
So clstr1 and 2 physical, 3 and 4 vm. Fs is the file share server name. Storage network ISCSI jumbo frames. Shares on a Dell PS6500. VMS hosted on MD3220 (same network) When this occurs I can move the Node/FS Role to a new server and comes back. I have to reboot the original server if I want it to work again. Problems start with some people and spreads.
We do folder redirects and people are quick to tell us of the issues. We brought in consultants to help resolve the issue. The packet capture was very interesting. So very responsive to all other traffic but SMB (1 and 2 no 3) show crazy delay with negotiating protocol. So in simpleton terms, client say hey, server ack, client smb access, server waits, 50 seconds later client say forget you, server says fine.
Problems are occurring more because we thought XP (since hotfix installed on server) was part of the issue, and we have migrated to Windows 7 heavily. Now crashes have gone from once a week to 1 to 2 times a day. We increased the size resources to AD PDC (VM) thinking it was pegged (that was yesterday).
Today we had the typical issues right before right out failure. Moved and services returned. I will be taking this article to the power that be so we can start a downgrade (I feel the best solution). Oh and these are brand new builds (fresh installs). We are tight on storage (SAN presenting lun to servers over ISCSI) might have to build and move the LUN to new server.
Anyone do this and have any issues? Team, We are also receving the same issue a couple of times per day with very similar symptoms. Yesterday I changed the autodisconnect registry settings which seemed to make the dropouts occur less, however the issue did re-occur today, once! I know this article doens't relate directly to server 2012 R2 but none the less could have an effect. See KB297684 (I can't post links yet) I would be interested if this makes a difference to the ones who can re-create the issue as my dropouts occur without warning and I cannot re-create.
DJL; please keep us updated with your contact from MS, hopefull they find the issue and release a patch. Thanks - I've sent your case number to the engineer dealing with my case. I'm not sure there is a common denominator other than Windows 2012/2012R2 - I think it's just a bug in the SMB Server or associated components. I'm seeing the problem with clean installs of Server 2012 R2 core - no av, backup, monitoring etc. I'm going to stick Server 2012 R2 on a desktop tomorrow and see if I can reproduce the problem on that - if I can then it'll rule out any problems with virtual machines, iSCSI, passthrough disks etc. Microsoft says they are calling me but I get nothing on my phone. I did a netstat from my DC and found that several computers were connected with hundreds of LDAP sessions.
Today we had issues with people gradually losing connections and with powershell I ran netstat -an Select-String -pattern ':389'. I found that the file server was no longer connected to a DC. It is strange, it was like my AD was experiencing a ddos on ldap. So I tracked down one of those PCs and ran netstat -b to find out why there were so many connections to the DC on 389.
Svchosts was running gpsvc.dll with tons of connections. Anyway blah blah blah Still verifying the fix will work - have to spend the week applying this hotfix to problem machines. I will let everyone know if we are good for awhile. Oh, and this is a year old fix that is not part of updates for windows 7.
@TCGood Single server, ~14 workstations plus 5 additional devices. Not a ton of traffic, but at any given time we have 20-30 files open with locks. Mostly small files under 1mb. The busier days with lots of opens and saves seem to coincide with this problem manifesting more frequently. Then again, I came in to the issue this morning and no one had been using the network at all for 12 hours.
Only 3 active logins when I arrived, and 2 of them were in the process of stalling out. All other active file share log ons were unresponsive. In the interest of getting people back to work, I just restarted the server. I'm with you on being at the point where downgrading makes the most sense. But my business won't put up funds for a 2008 R2 license, so in lieu of that, I recently set up a second VM and installed Ubuntu LTS. I'll be setting it up as a simple file server later this week until I see some resolution from Microsoft.
Not ideal, but it can't be helped at this point. That is bad, at least I am at 600 devices. Traffic is between 5 and 100mb mostly. 1) Are you using Quotes (file resource manager) 2) Install disc where/how acquired 3)Special settings ie continuous availability, Volume Shadow Copy, ABE etc.
I am still looking for a common factor. MS has my captures and keeps asking the same questions, 'when the cluster goes down can you access the node?' I can RDP, ping, and otherwise responds. Admin Share and all SMB unavailable.
I am going to spin up a VM with a different install disc to stress test. The registry key disables leasing. More info on leasing can be found here: The following will be logged in the event log when you add the reg key: File leasing has been disabled for the SMB2 and SMB3 protocols.
This reduces functionality and can decrease performance. Registry Key: HKLM System CurrentControlSet Services LanmanServer Parameters Registry Value: DisableLeasing Default Value: 0 (or not present) Current Value: non-zero Guidance: You should expect this event when disabling SMB 3 Leasing. Microsoft does not recommend disabling SMB Leasing. Once disabled, traffic from client to server may increase since metadata and data may no longer be retrieved from a local cache. So far this seems to have solved the problem for us. I'll try and get more info out of Msft later once I've fully confirmed it solves the problem for us. Yes - the key was recommended by Microsoft support, apparent they have quite a few people reporting this issue at the moment;) and yes we have been able to reliably reproduce the problem.
Today was quite promising: we brought both our file servers back up to full load and we didn't see any problems once the registry key was set - we haven't managed 20 mins at full load since the problem surfaced so definitely better. I'm still slightly on edge about it though as Wednesday is generally a quieter day for us - if we manage to make Friday afternoon without it resurfacing I'll be more confident. Tcgood - you won't lose all the benefits of SMB 2.0, 2.1, 3.00 and 3.02. The reg key is only disabling leasing, all the other improvements will still be available and 3.02 will still be negotiated (client os version dependant). Having said that I don't consider the reg key a fix, merely a work around. Once I'm confident our servers are stable with the reg key i'll push MS to see what they plan on doing about it permanently. The odd thing is file leasing is in SMB 2.1 which was available in Windows 7/Server 2008 R2 and as far as I'm aware this issue didn't affect 2008 R2.
I guess it could be directory leasing which was introduced in SMB 3. We're experiencing the same issue as everyone else on this forum where our 2012 serverd (not all only two and we have eight in our environment so far) will not accept SMB connections, but all other connection are fine. Much like everyone else we've tried several things (listed below) and the only temp solution is restarting the server: Actions take so far with no success! 1.) Restarting the Server Service - The service doesn't start back up, which leads to a reboot anyway. 2.) Verified the following Rollups were installed.
3.) Turned off Background Optimization for Data Deduplication We are currently working with MS (Case #59226) on this issue and they have not idea. They are just having use collect logs and dump. We updated the case to SEV A case, so hopefully we have something today.
So I thought it wouldn't hurt to post on this site as well, so see if anyone else had any thoughts. I'll will keep you guys informed as we try things to come to a solution. A possible hotfix (that I came across that deals with SMB2 and SMB3 is due to a memory leak. I was wondering if anyone noticed during their issue if the NonPaged Pool memory was exhausted. Hi - I came across that article a few days ago and have been monitoring our paged and non-paged pool.
Both seem normal and don't change when the problem is occurring. The following potential solution was posted on edugeek. No help for me as we're running Hyper-V, but interesting non the less. >>Solved the problem on my end.
I don't know why it should matter, but I was using an E1000 network adapter on the VM's giving me this issue, I switched them to VMXNET 3 and have not seen the issue re-occur. This was almost a daily problem and it has not happened for the 8 days that I have been running the VMXNET 3 adapters. I know some of you are not using VMWare ESXI hosts, but for those that are, give this a shot! ESXI 5.1 Server 2012R2. I saw that on the edugeek forum as well. I'm going to give it a try this weekend, when I can take down our VMs and make sure the changeover won't affect anything else. Like the guy that posted it, we're using VMWare's ESXI 5.5 and assigned our NICs as E1000 network adapters in the windows environment.
I still have printers and a small subset of user files running on the share, even though a majority of our files are now hosted on a second VM running Ubuntu Server 12.04 (LTS release). That way I can still do some testing to see if solutions work for us. I'll post back if switching the NICs over helps at all.
I hope I can help you everyone out with this solution MS have provided me. I have been dealing closley with the Network Team, in particular the most skilled escalation tech of Asia Pacific who happened to be an expert in SMB. My system has been stable for around 10 days now. My Environment • vmware ESXi 5.1.0 build 1123961 • Server 2012 R2 My Symptoms • Random network dropouts approx 2-4 per day, cannot access any file shares servername share, severname c$ etc • During the dropout server 2003 boxes can access the shares, all OS's above 2003 cannot (indicating there is an issue with SMB2 and above). For more info on SMB versions see • Restarting would fix the issue immediately MS have recommended two components to this fix, however, with the vm driver fix applied I was still experiencing the issue, it wasn’t until I made the change to the srv2.sys that the fix became permanent.
Vmware driver; MS believe that this driver ( vsepflt) could have been conflicting with the srv2.sys driver. To disable follow this article. Srv2.sys; In my opinion this is the actual change that fixed the issue. The srv2.sys driver controls SMB 2 traffic at the kernel level, in operating systems pre Server 2012 R2 the driver is set to auto start. Microsoft have changed the functionality for Server 2012 R2 to ‘start on demand’, this seems to not be starting gracefully when a request is made on SMB 2 or above.
To change srv2.sys to auto start, open cmd and type sc config srv2 start=auto a reboot will be required after running this command. When I talked to the lead network tech about this fix, referenced him to this article, how many people it was affecting etc, he advised that in all cases experiencing this issue changing the srv2.sys to auto start has worked 100% of the time. Why haven’t MS release an official patch I asked – because there have not been enough cases to warrant an official fix. I’m very interested in if this fixes your issues, please mark this as an answer if I bring you success!
Best of luck! When I talked to the lead network tech about this fix, referenced him to this article, how many people it was affecting etc, he advised that in all cases experiencing this issue changing the srv2.sys to auto start has worked 100% of the time. Why haven’t MS release an official patch I asked – because there have not been enough cases to warrant an official fix. Rant: To be fair, the only reason I haven't officially reported/opened a case with Microsoft is because my company can't afford the service contracts they charge for issues like this.
I understand the need for service contracts during personalized troubleshooting, but it seems counter-intuitive to me in a case like this where it's quite clearly their buggy SMB2 & 3 'upgrades' causing the problems. On topic: Once others post their experiences with this fix, I'll try it on our system as well.
At the moment, I'm verifying stability after changing our network adapters to from VMWare assigned E1000 to VMXNET3. So far so good on that fixing our issues, but I had to move the bulk of our fileshare to a Ubuntu LTS release running SAMBA in order to get a reprieve, so I'm not sure I'm really testing it. If that change that DJL mentioned from the edugeek thread doesn't fix the issue, I'll try your solution, which I'm extremely glad to have as a back up option. Thanks very much for your report on this.
What reg fix? I know you mentioned one above, but I thought it wasn't working. Follow these instructions to fix your issue (see full description in my post above), let me know how it goes. MS have recommended two components to this fix, however, with the vm driver fix applied I was still experiencing the issue, it wasn’t until I made the change to the srv2.sys that the fix became permanent. Vmware driver; MS believe that this driver ( vsepflt) could have been conflicting with the srv2.sys driver. To disable follow this article.
Srv2.sys; In my opinion this is the actual change that fixed the issue. The srv2.sys driver controls SMB 2 traffic at the kernel level, in operating systems pre Server 2012 R2 the driver is set to auto start. Microsoft have changed the functionality for Server 2012 R2 to ‘start on demand’, this seems to not be starting gracefully when a request is made on SMB 2 or above. To change srv2.sys to auto start, open cmd and type sc config srv2 start=auto a reboot will be required after running this command.
When I talked to the lead network tech about this fix, referenced him to this article, how many people it was affecting etc, he advised that in all cases experiencing this issue changing the srv2.sys to auto start has worked 100% of the time. Why haven’t MS release an official patch I asked – because there have not been enough cases to warrant an official fix. I’m very interested in if this fixes your issues, please mark this as an answer if I bring you success! Best of luck! I applied the sc config srv2 start=auto command and will report back if it does the trick.
TBH, it might be weeks before I know for sure, just depends on what randomly triggers the freeze. I went weeks with no issues, now it's happening three times a day. For me, SMB crashed again after the 'fix'. I can't fathom why MS can't figure this out. Seriously, its there freaking code, you guys have provided dumps what else do they need. For those that have cases, has anyone actually gotten this escalated to American tech support? In other words level 3?
Sc config srv2 start=auto isn't a fix - I can still reproduce the problem Eric I share your frustration. I've been escalated to level 2 and had to basically start from the beginning again with the new engineer. After going through disabling all the advanced nic features and blaming 3rd party software (there isn't any) we are back to memory dumps and tracing again. I've had to migrate one of our file servers back to 2008 R2 (a 4.5 yr old OS!), but I'm stuck with 2012 R2 on others as we're using work folders. To reproduce the problem we: • Map a share on the server to a workstation. Run IOMETER on the share to stress the server.
IOMETER settings are: 2,000,000 sectors 400 outstanding IO 512B 100% read access specification 4 workers. This takes the disk activity up to 100% • We then login a number of Windows 8.1 workstations simultaneously - the users roaming profile is stored on the same server/volume. • We normally login to about 40 machines at the same time to make sure the problem happens, but it can happen with a few as 1 or 2 machines. We have no AV on our servers, or any other software for the matter. To reproduce the problem we: • Map a share on the server to a workstation. Run IOMETER on the share to stress the server. IOMETER settings are: 2,000,000 sectors 400 outstanding IO 512B 100% read access specification 4 workers.
This takes the disk activity up to 100% • We then login a number of Windows 8.1 workstations simultaneously - the users roaming profile is stored on the same server/volume. • We normally login to about 40 machines at the same time to make sure the problem happens, but it can happen with a few as 1 or 2 machines. We have no AV on our servers, or any other software for the matter if you run fltmc in the command prompt, what shows up? C:>fltmc Filter Name Num Instances Altitude Frame ------------------------------ ------------- ------------ ----- DfsDriver 0 405000 0 Cbafilt 3 261150 0 Datascrn 0 261000 0 Quota 0 125000 0 npsvctrig 1 46000 0 Just to give you an idea, this is all i have. Filter Name Num Instances Altitude Frame ------------------------------ ------------- ------------ ----- npsvctrig 1 46000 0 Knock on wood, i've been stable thus far. You might want to try disabling file screening, quota's, etc. One at a time to see if things start behaving.
Hi Everyone, I thought I'd jump in here too. I'm glad to find this tread, I've been pulling out my hair for a while on this one. I have a Win 2012 (not r2) server having the same SMB issue described here. I've been struggling with it for some time. I did a complete fresh install of a Win 2012 VM, it worked for a while but then the issue popped back up. Here's my situation: Every day or so (sometimes more, sometimes less) the LANMANSERVER service (SMB/SERVER service) stops responding to win7 clients. Accessing files from the console of the server via the volume drives letters (c, d, e etc) works fine, just the mapped SMB drives (M, S, H) do not work, accessing via IPADDR share does not work either.
-Win XP client seems to be able to access fine, which suggests a problem with SMB 2/3. Trying to Stop and restart the SERVER service does not work, hangs at STOPPING. Only solution seems to be a full reboot.
Server setup: Windows 2012 Standard server running on VMware ESXi 5.5 Direct attached storage RAID 5 on a Lenovo R700 HW raid controller. Windows disks are just regular vmdx files from vmware, no passthrough, no iscsi etc. Saw the same issue on a previous Raid 10 array, with a different controller. Other (non win2012) VMs on the same datastore have no problems. There has never been a AV on this server. The clients are running ESET v5.
I have a separate Win 2012 domain controller remains online, 'netstat -b' shows 2 connections on 389 to the DC from the file server before and during the issue. I've never seen the file services on the dc have trouble, but they're not doing much Thoughts: Like most here, I don't think Its a hardware issue - there are too many things that don't fit. I believe it's an issue with the SMB/LANMANSERVER/SERVER service or a related driver/service. Its hard to believe after all this time MS doesn't seem to have a common patch. Fixes just tried: I have just tried the following fixes suggested here and I'll report back on the success.
--Change the Virtual network adapter from E1000 to VMXNET3 --Autostart SRV2 from an elevated CMD prompt: sc config srv2 start=auto --unload driver from an elevated CMD prompt: fltmc unload vsepflt --Change: HKEY_LOCAL_MACHINE SYSTEM CurrentControlSet services vsepflt Start to value '4' reboot *************************** Update: going on 3 weeks, SMB's still working well. I think I've licked it, I was rebooting 1-2 times a day before. Hi all, This is response I have had back after they analyised our last memory dump: The SRV2 threads responsible for processing incoming SMB requests are stuck on NTFS lock, owned by another thread trying to perform file system IO.
The file system is hung as couple of IO requests containing 2043 packets to the device “ SCSI Disk&Ven_EQLOGIC&Prod_100E-00 000000” has been blocked for over 17 minutes. This has caused many SRV2 threads to be hung to ensure serialized access to file system resources.
With no more Threads available in the SRV2 queues, large number of SRV work items are queued up and system is unable to process new SMB requests. Suggestion: • To engage vendor of SCSI device EQLOGIC in order to verify if there is any underlying issue with the disk. • As a work around try increasing the number of SRV2 threads using the following registry key on the File server.
Though this will delay the issue in current circumstances but will not guarantee the remediation of the issue. Key: HKLM SYSTEM CurrentControlSet Services LanmanServer Parameters Type: DWORD Value: MaxThreadsPerQueue Default value for MaxThreadsPerQueue is 20 you can try increasing to 1024. If issue still occurs please collect the kernel dump again as before when issue occurs without running the IOMeter and send it for further analysis.
I haven't tried the registry key yet. It looks like there is an issue else where which is causing this problem.
I'm going to try and capture another memory dump without running iometer. I can't see the issue is with our storage system as everyone else here is seeing the issue on varying different hardware. I'm experiencing the same issue on a Windows 2008 Server running as a VM within a Windows 2012 Hyper-V cluster. It's running off of Dell Servers and an Equallogic SAN.
Every week we experience the issue where no one can connect to that specific 2008 server's file shares. When trying to stop the server service it hangs and won't shut down. When shutting down the server it also hangs and I have to force a shut off. After it restarts everything is fine.
I have not tried any of the fixes recommended so please report back if any of the fixes listed are continuing to work. Thank you everyone for sharing. For what it's worth, our SAN is Nimble Storage, so it's not the the SAN vendor (unless they're both messed up). I suspect in your case, you have EQL mounted directory via a software initiator? I bet if if it was VMware virtual disk, they'd be blaming VMware. Regardless, the storage vendor is a read hearing. To me, it still points back to something messed up in the SMB stack.
I was really hoping they were going to come back with something a little more solid (as I'm sure you were as well). They're troubleshooting the symptom, not the problem. Eric - yes, definitely a red herring - they've tried to blame 3rd party software/hardware several times now. We use the Microsoft iSCSI initiator, so literally the whole stack is Microsoft through from receiving/sending the SMB packets at guest level all the way through to iSCSI at the host level. The only 3rd party code is Intel drivers and the Dell EqualLogic MPIO DSM on the hosts. Regardless I spent a day checking and updating our storage arrays. Dell took diagnostic logging and couldn't find anything wrong (surprise!) so at least that should keep Microsoft happy.
I've now got to try and capture the memory dump again, plus some new storage tracing, but without using IOMeter to recreate the problem. Is anyone else with open support cases getting anywhere? We are also experiencing the same issue. Infrastructure running on Windows Server 2012 on VMware ESX 5.5 Update, IBM FLEX Chassis, Blade X240 and IBM V3700 Storage. Our AD is 2008 R2. FFL 2003 and DFL 2008 R2. We are also running DFS Namespace on top of the file server.
Symptoms: Intermittent disconnection of map drives Cannot access share or slow to open Server hangs at restart Once hard rebooted, the server is up and running. Just logged a call with Microsoft. No relevant log on Windows, ESX or Storage. Anyone found a permanent fix so far?
Irfan Goolab SALES ENGINEER (Microsoft UC) MCP, MCSA, MCTS, MCITP, MCT. So my case has been escalated again 'as it's more complex than normal' and I've also had the 'we will only spend commercially reasonable efforts on this case going forward' disclaimer. I urge anyone experiencing this problem to open a support case with Microsoft (if you don't have a support contract it'll cost you £240.stick it on a credit card). The more cases they have reported, the more worth while it is for them to spend time fixing the problem. I managed to make one of our SQL Servers fall over today - I tried to copy a backup of a database off the server using SMB.big mistake! Yet it's quite happy when SQL hammers the drive with tens of thousands of iops and 300MB/s.
SMB is broken! I could be moving to Linux soon!:s. @DJL Moving to Linux is what I chose for the time being. The amount of time (translation: money) our company has wasted on troubleshooting this particular problem was enough to convince me that at least for a simple file share, it made sense to switch over. For you, the time constraints necessary to set up a linux server are likely more daunting given how many users you have. A simple samba file share was sufficient for us for the time being. I was hoping 'time being' meant 4-6 weeks for a patch, but now I'm thinking I'll be lucky if a patch is issued by 2015.
I don't understand how this isn't a bigger issue. If it's a fundamental flaw in how 2012 is handling SMB, which it appears to be, I can only assume it's a much more widespread issue than they're admitting to. If I can convince my boss to open a case with M$, I will. We have certain industry applications that ONLY run on Windows, so at some point we're going to NEED this to be fixed. For now, work arounds are enough. @LMosla I think we're a little beyond the whole 'windows event log errors' stage at this point. So my case has been escalated again 'as it's more complex than normal' and I've also had the 'we will only spend commercially reasonable efforts on this case going forward' disclaimer.
I urge anyone experiencing this problem to open a support case with Microsoft (if you don't have a support contract it'll cost you £240.stick it on a credit card). The more cases they have reported, the more worth while it is for them to spend time fixing the problem.
I managed to make one of our SQL Servers fall over today - I tried to copy a backup of a database off the server using SMB.big mistake! Yet it's quite happy when SQL hammers the drive with tens of thousands of iops and 300MB/s.
SMB is broken! I could be moving to Linux soon!:s I'll be getting a case open soon. We had the issue come back after 2 months. There's clearly something particular that triggers it, but I have not been able to reproduce it manually. Hi all, So my case has just been archived for the time being.
I have been told that Microsoft have no fix for this at the moment. Apparently they are aware of some issue with SMB 3.02, although I have no further information other than that. Basically as soon as any updates to the relevant dll's are produced the engineer will let me know so I can see if they fix the problem. So basically. I now just have to wait, or dump 2012 R2.
Not great news I still think it's worth opening support cases if you can as it will at the very least bring more attention to the problem, and they may just discover something that help isolate the problem. Hi all, So my case has just been archived for the time being. I have been told that Microsoft have no fix for this at the moment. Apparently they are aware of some issue with SMB 3.02, although I have no further information other than that. Basically as soon as any updates to the relevant dll's are produced the engineer will let me know so I can see if they fix the problem. So basically. I now just have to wait, or dump 2012 R2.
Not great news I still think it's worth opening support cases if you can as it will at the very least bring more attention to the problem, and they may just discover something that help isolate the problem I opened a case (finally). I had a 2012 R2 server experience 'the issue' on 6/10. It had the May rollup (KB 2955164) installed and rebooted on 6/4, so I don't think it helps. I think I will try the DisableLeasing registry key, it's listed as a workaround. I have quite a few production 2012 R2 file servers out there, and so far 2 have experienced the SMB lockup. I am unable to reproduce even with performing millions of scripted create/update/delete operations from a dozen clients.
REG ADD HKEY_LOCAL_MACHINE SYSTEM CurrentControlSet Services lanmanserver parameters /v DisableLeasing /t REG_DWORD /d 1 /f. We have had our 2012 R2 Standard file server VMs in production since February, but only started seeing this issue on June 22, after I installed regular Windows updates.
Restarting server service hangs, will not reboot - have to hard power off (but oddly don't get startup errors on power-on). Same basic symptoms (no sharing, can see files on server via RDP, no pertinent errors in Error Log); here's our spec: Dell 720 hosts, Compellent 10k fiber-attached storage ESXi 5.5, VMXNET3 NICs (since build) We are looking at downgrading to 2008 R2, but I'm worried when that OS will be EOL, forcing us to upgrade to 2012 - maybe they will have a fix by then?;). Don't forget that if you have an AV it possible to be a suspect as well.
If you are still seeing the symptoms, and you've applied all patches (including the one i linked), then you should make sure your server has all software ripped out of it. MS will make you do this anyway if you open a case with them. That said, open a case with them. The more exposure, the higher the likely hood of getting a resolution. Finally, so far (fingers crossed) our server has been stable since the patch. Won't know for sure for at least a couple more weeks though.
Just curious for anyone else having these issues, do you have shadow copies configured? I ask because the last two times we had this issue, all of the previous versions we had for our two data volumes were gone. We generally have at least 2-3 weeks of versions (twice/day). We've been experiencing this issue ourselves. It just seems to have started the past few months and occurs about every 3-4 weeks on Server 2012 R2. We have Deduplication and Volume shadow copies enabled, but I haven't been able to log in yet to verify if they are still present. This is also running on vSphere 5.5 and EqualLogic PS6100 as many people seems to have also.
This is the main File Server and I'm considering moving the files to a 2008 R2 server. I don't believe it would be too difficult as I used DFS name spaces for all the shares and drive mappings. Hi, Just an idea, please make sure that you have the VERY latest firmware and driver version for your NICS -especially the Broadcom. I have experienced similar issues on several setups. At the moment, not 100% sure (I need more time to be sure the issue does not happen anymore), but it seems that in my case it was an issue with VMQ being defective on Broadcom NICs.
Latest drivers solve the issue, see In any case I would direct you towards checking that your network is completely healthy. Not sure if it applies to you, but in my case there is a CLIENT bug behind all of this (and an old one!).
See this: In my case the issue is that I sometimes lose access to server share, while access to server.fq.dn share remains OK -other times, it's the other way around. Not sure of course if it's the same bug you have.
But the idea behind is that SMB maintains a list of 'unreachable servers' on the client, this list should time out, but does not. If for some network-related issue (in my case, the VMQ feature) you sometimes, even very shortly, lose the network connection to the server, seems that this list gets filled. Just so folks know, after the update, we have been rock solid. One other note, we're run VMware and as part of applying the patch we also ensured we were running the latest vm version and tools (which include updated drivers) and all the latest MS patches. We're also on the latest Vmware version (5.5) for what its worth.
That said, if you're still having issues, take your server down to bear bones, check your filter drivers and make sure only the base filters are loaded. Finally, call MS if you're still having issues. That's what I did and what ultimately pointed me to the patch. Hi, We have a similar issue to everyone else in this thread and have tried everything that has been put on the forum thread e.g. Leasing,srv2 service auto restart and changed from smb1 & 2 but as yet nothing has fixed the problem.
We ran the same iscsi fileshare on windows 2008 r2 fully patched for around 4 years with no issues but decided it was time to upgrade to 2012r2 but that seems to have been a bad choice! We also have a case open with MS who have taken network traces and diagnostics and seem to think that it is a nic teaming issue but again from the stuff i have read online other people have done similar things after being asked to and it hasnt fixed the problem. Any help/fix for this would be great!
Hi, I am having a similar issue and curious if anyone has experienced the same. I have a mixed Windows environment with Server 2003 up to 2012R2. Same as above, our primary file server was Server 2008R2 and running without issue for years. I replaced it with a new Server 2012R2 install and migrated the shares. All Windows 7 and 2008R2 or older clients connected to the shares without issue. All of my Windows 8/2012 and newer clients instantly cannot access the file server ' Server' by host name, but can access it by FQDN ' and IP '. After a reboot of the client, it would successfully connect by host name without issue.
I am running the file server on a Hyper-V 2012 R2 cluster with teamed nics. The file server is fully patched and has been restarted. If anyone has seen this please help. My issue was resolved and was linked to Active directory. We were imaging using WDS and (not me) the image had a bug. We use GPP to map drives and printers. The workstations would not let go of the port 389 connection.
We patched almost 300 workstations all holding 100's of connections the problem went away. The file server could not authenticate users. As everyone above might be experiencing the same symptoms just different causes. My recommendation is to lookup all possible hotfixes for your SMB and OS version.
Apply and verify. Just a shot in the dark here: is anyone else who is still having the problem running Extreme-Z IP? We are having very similar issues to the OP, very similar config: 1. Server 2012 R2 VM running on local, fiber-attached storage 2. Fresh (never-before used) LUNs connected via iSCSI 3.
File sharing periodically stops, but you can still RDP in and see all the LUNs 4. Restarting the server service hangs and you have to fat-finger the VM 5.
Fresh workstation build of 8.1 completely updated, also cannot connect when sharing is down, and server itself is patched to current We thought it might be shadow copies, as we see.vss errors, but even after disabling shadow copies we still have the issue every couple of weeks. The only other thing that affects file sharing (although it shouldn't affect SMB) is Extreme-Z IP, but I didn't see anyone mention it here.
Before I uninstall, I thought I would check here:). TCGood, Can you please elaborate on what the problems was exactly and how you made that determination? And more importantly what you did specifically to 'patch' the clients? We also use WDS (SCCM) to deploy our client OS images, plus GPP for drive mapping and are wondering if this is similar. We are seeing this issues with 2008R2 (Hyper-v 2012 guest) and my guess is that MS has updated the SMB on it to match client features/patches. Thus we are seeing periods where SMB is unresponsive or VERY slow. The effect is dramatic as we do folder redirection and the clients become unresponsive when SMB is slow.
In our case we see all network traffic go to effectively zero for a period then it recovers on its own. We have disabled leases and patched SMB with any and every hotfix we have found. The issue has become less dramatic but still occurring. Thanks for the help. So, I logged into the DC and found 1000's for:389 connections when running netstat. Our computers were being logged off and back on. But it would leave a connection to:389.
Computers were logged into many many times. So our FS could not authenticate users any longer. And I think these are the link you want, but I cannot remember the 2 I installed on the FS that regarded SMB. Basically users cannot authenticate.
The 15 minute wait our users had when logging in was basically them failing back to using local cached credentials. We were seeing outages several times a day with this. Ran the patch on all windows 7 machines in the environment (about 265) and bam we are sailing smooth again. I am since moved to another location and don't have the details.
Your situation might be different but the outcome is the same. The only netstat info on the DC should be Exchange and FS.
All connections can happen and then disconnect. We've just being doing some testing with Fixed VHDX's and Pass-through disks on Server 2012 R2. For us the SMB issues doesn't appear to occur on file servers which use fixed VHDX disks on CSV's rather than pass-through disk. Also having tested the performance we are seeing much better performance using VHDX's on a CSV compared to pass-through disks, which is the opposite of what we would expect! We are seeing anything from 20% to a 300% improvement in IOPS which is crazy. This is using the same server, same vm, same iSCSI SAN and same settings We're going to move a file server over to VHDX and see how that goes in productio. Just wanted to pop in: We've had the EXACT same issue: Dell NX3200 NAS Windows Server 2012 Storage Server (Windows 2012 R1) About once or twice a month, LanManServer services becomes unresponsive and kicks off all SMB file shares, server remains active on RDP and iSCSI.
You can ping the DNS name and it resolves, its not a networking issue, and the DCs see the server and are giving it permissions accordingly. You cannot restart the LanManServer service manually, it locks up as 'Stopping' and you cannot Restart the file server as it hangs at 'Shutting Down' because it cannot kill the LanManServer service.
I've tried the Microsoft patches made for this issue, no dice. The only current fix I have is hard restarting the server itself. This is incredibly frustrating, even more so that Server 2012 R2 seems to be affected as well, so upgrading the OS will not do anything to resolve the issue. This is affecting our business performance and even though downtime is minimal during these restarts, we are heavily dependent upon this file server. About ready to seek alternative SMB file server solutions.
Hi There, Thanks for the quick lesson on iSCSI best practice. As stated I have already checked the underlying storage/networking/iscsi/mpio etc. And there are no problems at all. The same iSCSI/cluster has been running production vm's for 4 years now without any issues. I find it weird that when the SMB service manages to get locked up like this, I can still browse the files fine on the server. That would rule out any underlying physical storage issue surely?
One theory I had could be perhaps the use of an iSCSI passthrough disk in the 2008 R2 host to the 2012 R2 guest. This is the only thing unique to this VM, all other guest vm's use vhd files on CSV's.
Yeah, it does not appear to be related to iSCSI, unless the iSCSI service on 2012 is knocking out LanManServer. We have VMs hosted on our 2012 machine via iSCSI and when the SMB shares fail, the iSCSI shares remain responsive and operational. I'm experiencing this too, with Server 2012 (not R2) on a fairly simplistic virtual machine on ESXi 5.1. I end up with the LanmanServer service not responding to stop requests and taskkill.exe hanging when trying to stop it that way. The consensus seems to be to set srv2.sys to start automatically instead of demand-start, disable problematic anti-virus filter drivers, and change NICs on ESXi to use vmxnet3 instead of e1000e. But this doesn't seem to work for everyone.
Further, there's been some finger-pointing going on, blaming external storage and such. Has anyone tried building a generic PC with a generic ATA or SATA disk and supported NIC, and reproducing the problem on that?
This should eliminate third-party external storage or iSCSI as the cause and tell MS to stop finger-pointing. On this same generic platform, what about comparing service pack levels or specific patches? I had this server running since mid-2013 but hadn't had this happen until about mid-2014. My own symptoms don't include any event log 1020 entries in the SmbServer operational log. I get a lot of 1016 errors instead, and these seem to happen with a specific application that's run over the network and no other shares. My own filter list looks like this: Filter Name Num Instances Altitude Frame ------------------------------ ------------- ------------ ----- VirtFile 0 400 0 MpFilter 67 328000 0 Cbafilt 3 261150 0 Datascrn 0 261000 0 Dedup 65 180450 0 Quota 0 125000 0 npsvctrig 1 46000 0 System Center Endpoint protection is installed, but currently has real-time file scanning turned off after I read about virus filter drivers being part of the problem.
I do have deduplication turned on in one volume. We're seeing the same problem with 2012 R2 backed by VMWare ESX 5.5. We've recently opened a case with Microsoft and will keep this thread updated on progress. I can say adjusting the srv2 service to auto did significantly reduce the number of warnings (event 1020) present in our SMBServer->Operational log. Since the service adjustment last Friday only 1 event has been logged. The shares would go offline at the times these events were logged. Previously we'd see 200+ events in that log on bad day.
We already use the vmxnet3 adapters, they've been in place since the start of the issue and our VMWare tools are fully up-to-date. Lastly, we see and experience these warning message across 3 VMs. One of which is running on complete different hardware/storage/network etc. It seems directly related to 2012/2012 R2 and not environmental. Wish Microsoft had better answers. We migrated from a 2008 R2 system that never experienced these problems on the same underlying hardware. Going to chime in here as we have been experiencing this for almost a year.
Running 2012 R2 VM on ESXi 5.5 with a mix of RDM and VMDK disks. It used to happen about once a month, not enough to even spend much time troubleshooting the issue as it was easier to power cycle the VM. It gradually increased to every week, and is sporadic now. Times we will go 3 weeks without the issues, then all of a sudden had 3-4 time in one day.
We started on 2012, which was an upgrade from 2008. The VM had VMXNEX3 nics. We started with Symantec Endpoint protection, removed that and used MS Endpoint, then removed AV altogether. Last month we lifted the volumes and put them on a clean build of 2012 R2, using e1000 NICs, and no AV. We still see the issue.
We have given MS netstat, perfmon, and wireshart packet captures. Then they wanted a memory dump, but because our RDM LUN is greater than 2TB, we couldn't use VMware to snapshot the memory like they wanted. So I figure out how to blue screen the VM using NMI, they say the dump doesn't look like a complete dump, and wants us to use vmss2core utility to turn a snapshot into a memory dump. Well they botched the instructions on that, because they just told us to take a disk based snapshot - which is not what the utility needs. May 2014 rollup was already applied. Client side packet capture just shows SMB reset being sent after the 60 second default connection time lapses.
Get-smbopenfile on the server still shows tons of files open, but no one can connect. I could initially connect still from a 2003 server, but its not always consistent. We waste so much time with MS and they haven't even provided anything useful. Does anyone have updated info from MS? Problem is gone! I installed Hotfix.
This did not help. But then I added the registry key mentioned in the KB article as a workaround and since then our file server works without any error. REG ADD HKEY_LOCAL_MACHINE SYSTEM CurrentControlSet Services lanmanserver parameters /v DisableLeasing /t REG_DWORD /d 1 /f (The Server needs a reboot after adding the key!) The problem is gone now for about 2 months *knocking on wood and throwing salt over my shoulder* Give it a try and post if this is working for you.
Cheers, Stefan. For what it's worth, I'm still struggling with this too. Still working around it with a samba file share on a second VM, but I have a few files shared on 2012R2 and still see this intermittently. It also screws up the print server when it rears its head. Can't stop or restart server service on 2012R2 so the only option is a reboot. No Symantec in the topology, and it's a mostly clean install of 2012R2 as a VM on ESXi 5.5. Very little software has been installed besides that on the server, which leaves me pretty confident this is still a Microsoft problem.
More specifically, it's an SMB related Microsoft problem. This link describes our issues almost exactly: neither the workaround nor the patch has helped, as it did for Stefan. At this juncture, I'm about ready to scrap this VM and start over. Hope that in the next iteration whatever is causing it doesn't show up.
Did anyone with an open case ever get ANY resolution to this? Since last February, I had this problem happen to me once. At that point I made the service change (srv2.sys to autostart), VM NIC change (swap E1000E with VMXNET3), and disabled System Center Endpoint's real-time checking. Those changes held up for a while, until I re-enabled SCEP real-time checking after routine updates and it happened about six weeks later.
After that, I re-disabled SCEP real-time checking. It held up for a while since. During the most recent update cycle I also disabled SMBv2 (and thus v3 as well) from a PowerShell command. I haven't noticed a major performance hit but now an unrelated problem has come up in that some applications won't open more than one file from the same network share on that server.
I'm going to migrate my file share volume from a 2012 VM to a 2012R2 VM after discovering tha.
Article Summary: OS installation on a VM hosted on a Dell PowerEdge system may fail, when the hypervisor is not prepared to support the installation. This article explains how to prepare hypervisor for using ROK media. Symptoms During the Operating System installation on the virtual machine, the following message appears and the installation is not possible: System verification Failure [English] This hypervisor is not prepared for the Reseller Option Kit(ROK) media install or you are trying to install the ROK media on a non-Dell system. To prepare the hypervisor for the ROK install, please follow the steps mentioned in the ROK document at and then try again Press OK to restart the virtual machine This error message is also displayed in French, German and Spanish.
Clicking on restart will result in the same error message. Cause You can install the ROK media only on the Dell PowerEdge systems and virtual machines (VMs) hosted on the Dell systems. If you try to install the Windows Server operating system (OS) on a non-Dell PowerEdge system or on a VM hosted on a non-Dell PowerEdge system, the installation may fail, and the system verification failure message is displayed.
When the installer tries to access the BIOS of the machine to verify that the installation occurs on a Dell server, the hypervisor provides a nonmatching BIOS generating the error message. Solution The solution is to allow the installer to access the physical BIOS of the server. Please follow the procedures depending on the hypervisor that you are using to prepare for installation. • • • Windows Hypervisor HyperV To prepare your Microsoft hypervisor for using ROK media: • Insert the ROK media, and go to: ROK. • Run hyperv_prep.vbe on the Windows Server OS running Hyper-V, and follow the on-screen instructions.